https://outerbounds.com/ logo
Join Slack
Powered by
  • s

    shy-midnight-40599

    09/10/2025, 4:25 PM
    Hi Team, We are deploying metaflow using Stepfunctions/AWS Batch(with Fargate based Compute env). We are trying to run multiple executions of same flow to do load test. What we noticed is the jobs running times are abnormal. There are jobs which took around 2 to 3 minutes to complete and there are jobs which shows runtime as 2 hours. When we checked the logs, we could see the step completed in 2 minutes(based on the logs we added to the step). After completing the step, the job was running for 2 hours for no reason. Anyone else faced this? or any idea why. Let me know if you need more details on this.
    0
    a
    i
    f
    • 4
    • 8
  • h

    happy-journalist-26770

    09/09/2025, 12:58 PM
    Hi, Metaflow UI isnt displaying logs in the UI lately. Im running on K8s - deployed via helm. metaflow_version - 2.18.3 Images: • public.ecr.aws/outerbounds/metaflow_ui:1.3.5-146-ge6d68f08-obp • public.ecr.aws/outerbounds/metaflow_metadata_service:2.5.0 • public.ecr.aws/outerbounds/metaflow_metadata_service:2.5.0 Do let me know if im missing anything
    0
    • 1
    • 1
  • f

    fast-vr-44972

    09/09/2025, 12:07 PM
    I don't think you can pass a custom virtual env to
    pypi.
    0
  • f

    fast-vr-44972

    09/09/2025, 12:01 PM
    Most probably it's mismatching virtual env.
    pypi
    seems to be managing its own virtual env. https://github.com/Netflix/metaflow/blob/master/metaflow/plugins/pypi/pypi_decorator.py#L35
    0
  • q

    quick-carpet-67110

    09/09/2025, 11:19 AM
    Question about using custom
    image
    in
    @kubernetes
    decorator together
    @pypi
    decorator
    Hey everyone! We have a situation where most of our steps share a lot of packages but still require custom installations every now and then. So we have a base Docker image that is built with all of the common dependencies, but we would like to use the
    @pypi
    decorator to install the custom deps on the fly. Is this currently possible? I did a quick and dirty example flow with a custom base image and a custom dependency installed in the
    @pypi
    decorator and the code inside the step was not able to import PyTorch, even though it is available in the custom image.
    Copy code
    @kubernetes(tolerations=[{"key": "something", "operator": "Equal", "value": "another_value", "effect": "NoSchedule"}], gpu=1, image="pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime")
        @pypi(python="3.10.0",
            packages={
                "implicit": "0.7.2",
            },
        )
        @step
        def gpu(self):
    I searched in the docs and was able to find some information, but I am not sure if
    system-wide
    packages in the snippet refers to the container images packages or something else. Can anyone shed some light on whether or not the setup I am describing above is achievable with Metaflow? Thank you!
    0
    a
    • 2
    • 2
  • a

    ancient-fish-13211

    09/03/2025, 1:40 PM
    Hi again everyone, Does anyone know of a way to share a local docker image to the minikube setup that metaflow-dev sets up, so that I can test with it in flows? I've tried multiple ways that have all failed and without access to the minikube commands directly I can't see a way to do it. Thanks
    0
    e
    • 2
    • 2
  • h

    hundreds-wire-22547

    09/02/2025, 11:20 PM
    Hi, I upgraded pydantic
    version = "2.10.5"
    ->
    version = "2.11.7"
    and now seeing an error like below, is this a known issue?
    Copy code
    File "/tmp/ray/session_2025-06-16_12-59-52_860923_1/runtime_resources/working_dir_files/_ray_pkg_e01e7abcca487ccc/metaflow/datastore/task_datastore.py", line 369, in load_artifacts
        yield name, pickle.loads(blob)
                    ^^^^^^^^^^^^^^^^^^
    AttributeError: 'FieldInfo' object has no attribute 'evaluated'
    0
    h
    • 2
    • 7
  • c

    clever-midnight-3739

    09/02/2025, 4:22 PM
    Hi everyone! I am new to MetaFlow and trying to understand how to deploy a flow using remote resources on different compute backends. In this tutorial, there is a flow with steps assigned either locally, on AWS Batch or on remote k8s. How is MetaFlow set up in this case? How does the config.json look like to support both AWS Batch and k8s? Also, in cases where this flow is deployed (for instance, on Argo), on which compute would each of these steps run? Thank you very much for your help!
    0
    a
    • 2
    • 1
  • a

    adorable-truck-38791

    09/01/2025, 1:55 PM
    hello, I'm trying to use the
    metaflow-dev up
    command... it seems to be mostly working, but it seems to keep asking for my password when it's starting all of the services. the weirder thing is that it keeps saying my password is wrong, so i'm not even sure what password it's trying to ask for (is it something related to the minikube/argo roles or something like that? I have no idea)- any thoughts on what I should be trying to fix this?
    0
    • 1
    • 4
  • c

    crooked-camera-86023

    08/29/2025, 10:51 PM
    Hello I tried to apply metaflow.tf to our own internal infra, and I got the following error and really appreciate any pointer/help.
    0
    h
    • 2
    • 5
  • a

    ancient-fish-13211

    08/29/2025, 10:04 AM
    Hi everyone, I'm having a bit of a silly problem using metaflow as a dependency in a pycharm project. When trying to import things like
    Copy code
    from metaflow import FlowSpec, step, kubernetes, retry
    FlowSpec and step import fine, but I get Cannot find reference errors for kubernetes and retry. If I launch a python console or a notebook the imports work fine so it seems like an indexing issue. I've tried the typical invalidate caches, with no luck. I'd rather not just disable the warnings if possible. Has anyone had similar issues or have a solution? Many thanks
    ✅ 1
    0
    f
    • 2
    • 2
  • d

    dry-beach-38304

    08/28/2025, 7:39 AM
    For anyone using the Metaflow Netflix Extensions — it’s been updated to 1.3.0 (compatible with the new packaging framework introduced in Metaflow 2.16.0). Apologies for taking so long to update, a bug in Mamba 2.3.1 was preventing the tests from running correctly (2.3.2 was released 2 days ago) and I wanted to make sure things were not too broken. Lots of bug fixes and a few new features. It is not compatible with Metaflow < 2.16.0. More features coming soon as well.
    excited 1
    💯 1
    0
  • n

    narrow-forest-28560

    08/27/2025, 11:12 PM
    Hey everyone, There’s a lot of encouraging new developments (company has been mostly using Metaflow<2.12). I wonder if it is now possible with some reusable utility interfaces, to wrap any single function with a decorator to make it into a runnable flow without having to write a whole separate flow script. Particularly, we have been using Runner API in Airflow DAGs to run flows on AWS Batch. Furthermore, it makes sense as part of this to be able to unit test such functions. I’m encouraged by the latest Metaflow 2.18 blog post, but if the entry point is as simple as an RESTful endpoint say via FastAPI, it would be an encouraging push to update to newer Metaflow versions (with internal training).
    0
    a
    s
    +2
    • 5
    • 18
  • s

    square-wire-39606

    08/27/2025, 9:29 PM
    conditionals are now GA
    💯 1
    ❤️ 1
    🙌 1
    0
  • s

    square-wire-39606

    08/27/2025, 9:28 PM
    old thread but conditionals are now GA
    0
  • s

    square-wire-39606

    08/27/2025, 9:27 PM
    old thread, but finally we got around to implementing it
    0
  • c

    calm-rainbow-82717

    08/26/2025, 7:41 PM
    Hey everyone, I have a question regarding cli command in metaflow. Is there a way to customize the cli command after the python file, like e.g.
    myflow.py
    . I want to create something like
    python myflow.py data check
    ``python myflow.py data plan` next to the existing ones
    python myflow.py run
    ``python myflow.py show` , any idea if it's possible to do this? I see it seems possible to use the metaflow-extension-template? And I also wonder if there's some other way to achieve the goal. like the customizing stepdecorators using a generator function. Thanks in advance!
    ✅ 1
    0
    d
    • 2
    • 3
  • h

    hundreds-receptionist-20478

    08/26/2025, 7:11 PM
    👋🏻 Hey everyone! I’m an experienced AI Agent Developer open to new projects or full-time roles. I specialize in building autonomous agents using GPT-4, LangChain, AutoGen, CrewAI, and other advanced frameworks. What I Do: • Autonomous research & data-gathering bots • Multi-agent systems for delegation & collaboration • AI assistants with memory, planning & tool use • Trading bots, IVR agents, customer support agents & more Tech Stack: • Python, TypeScript, Go, C++ • LangChain,Langraph, AutoGen, ReAct, CrewAI • OpenAI, Claude, Hugging Face, Playwright, API integrations I'm especially interested in ambitious startups, Web3 projects, and next-gen AI tools. Feel free to reach out if you’re building something exciting — happy to chat!
    👋 2
    👋🏼 1
    0
  • g

    great-egg-84692

    08/26/2025, 5:02 PM
    does metaflow support configuring `concurrencyPolicy`for argo CronWorkflow, https://argo-workflows.readthedocs.io/en/latest/cron-workflows/?
    0
    a
    t
    • 3
    • 23
  • f

    few-dress-69520

    08/26/2025, 11:17 AM
    I'm running into a problem when trying to resolving named environments with packages from a private pypi repository (AWS codeartifact). I want to use pip environment variables to pass the extra-index-url, e.g. through PIP_EXTRA_INDEX_URL. My understanding from other posts here is that what I'm trying to do should work. I've tried
    Copy code
    PIP_EXTRA_INDEX_URL=<url_with_temporary_token> metaflow environment resolve -r requirements.txt --alias test_env
    which fails already in the first step of resolving the environment. It just doesn't have access to the private repo and fails to resolve any private packages.
    Copy code
    ERROR: Could not find a version that satisfies the requirement <private-package>==0.1 (from versions: none)
    ERROR: No matching distribution found for <private-package>==0.1
    Strangely, when creating a pip.conf that contains the extra-index-url it almost works. When running
    Copy code
    PIP_CONFIG_FILE=pip.conf metaflow environment resolve -r requirements.txt --alias test_env
    Metaflow is able to resolve the environment including the private packages and their dependencies, but in the step where it downloads the packages from the web, I get a
    401 Client Error: Unauthorized for url:
    for the private repo. It looks like when trying to download from the web it doesn't use the pip.conf anymore but instead tries to directly access the url prepared earlier in the process (without the token) and hence fails. I see that there is some auth handling here but this doesn't seem to do the thing that's necessary for my use case. I'm using metaflow==2.15.21 and metaflow-netflixext==1.2.3.
    0
    d
    • 2
    • 1
  • a

    acoustic-river-26222

    08/23/2025, 6:03 PM
    Hi everyone!! I am running
    netflixoss/metaflow_metadata_service:v2.4.12
    for the UI service startup. When running the command i get
    "/opt/latest/bin/python3 -m services.ui_backend_service.ui_server": stat /opt/latest/bin/python3 -m services.ui_backend_service.ui_server: no such file or directory
    . Do you know if the path of the container init script changed ? 😁
    ✅ 1
    0
    s
    • 2
    • 2
  • b

    bland-garden-80695

    08/23/2025, 12:04 AM
    Hey All, while I work on testing the decorators and test it with Agentic workflow. I had some generic questions for the team. 1. Who are the primary users? Do Metaflow intend to cater them, or expand to other professions/fields in the future? 2. In which direction is Metaflow moving at this stage? What is the vision of the product?
    ✅ 1
    0
    v
    • 2
    • 6
  • a

    adorable-truck-38791

    08/22/2025, 2:30 PM
    hey Metaflow team, I wanted to see whether it's feasible to do something like version and track the packages built for each flow (not the runs but rather the code backing the different runs). Basically, I have an org where there are people who write code around statistics and other math functions but are not engineering savvy. With this in mind, I wanted to see whether I could make something like the following: 1. I define a somewhat abstract flow that has static inputs and outputs 2. Someone else would just give me a function that adheres to the static interface (a simple statistical function that takes some well-defined data inputs and produces some well-defined outputs) 3. I'm able to run the same higher level flow where it just handles some of the data input handling and the function output handling in a consistent way 4. I build some index of the packages build & run for the different custom functions given so that I can: a. Re-run that specific package with different data inputs potentially b. Produce some index of the packages and runs The main goal here is to try to close the gap between the more engineer-y interface of Metaflow and the technical capabilities of people who are more math/statistics focused... but I think there are a few things that are hard: 1. Dynamically creating a
    FlowSpec
    subclass such that one of the steps calls whatever random function gets thrown into the mix 2. Managing the underlying packages backing the different runs I might be way over-complicating this, so I would appreciate any thoughts or pointers! I'm happy to dig into the code and work through some of the internal APIs for this. I do realize there are security concerns with executing arbitrary functions in this manner, but I think that is manageable in the environment we work in
    0
    h
    • 2
    • 8
  • b

    brash-gold-6157

    08/21/2025, 2:49 PM
    Hi all, I'm new to metaflow trying to help setup a poc environment. We already have Argo workflows deployed, and multiple kubernetes clusters in Azure and onprem. I don't really want to just run the pre-built terraform config as we already have many of the resources we need, included storage accounts etc. Can anyone point me to documentation on this? Should I be starting with the helm charts here? https://github.com/outerbounds/metaflow-tools/tree/master/charts/metaflow Many thanks!
    ✅ 1
    0
    a
    h
    • 3
    • 3
  • e

    enough-article-90757

    08/20/2025, 8:58 PM
    In the event-trigger docs, there's a mention of setting an event-trigger via the config. But when I follow the example given, I get a Python error saying that the config symbol could not be found. Are the docs up to date? I also found https://github.com/outerbounds/config-examples/blob/83ece9c8e916f4d5f549e7bb717cbfca80b5b555/flow-level/toplevel.py#L15 which indicates using
    config_expr
    ✅ 1
    0
    h
    • 2
    • 5
  • c

    cool-businessperson-43467

    08/19/2025, 3:31 PM
    Hi, please clarify, maybe somebody have an example of connection metaflow with argo-workflows, without argo-events for this moment?
    0
    a
    h
    • 3
    • 8
  • a

    adventurous-australia-32236

    08/16/2025, 9:20 PM
    Hey all! Could you ballpark the cost of deploying Metaflow on GCP? For a deployment just sitting idle, that is. Thanks!
    0
    a
    • 2
    • 2
  • h

    hundreds-rainbow-67050

    08/14/2025, 5:53 PM
    announcement speaker PSA: Outerbounds Office Hours We’ve got two great talks coming up: • Michael Bao (Netflix) – 📆 Aug 19 ⏰ 9am PT ◦ Building a scalable GPU training service with Metaflow. Learn how they migrated PyTorch Lightning scripts with zero model/data changes, added custom decorators for configurability, and deploy seamlessly via CI/CD. ◦ 📍 RSVP: http://lu.ma/office-hours-with-netflix • Gergely Daróczi (Spare Cores) – 📆 Sep 2 ⏰ 9am PT ◦ Right-sizing compute for AI/ML workloads. See their open-source
    @track_resources
    package in action, with interactive reports that optimize CPU/GPU usage and cut cloud costs—plus a live demo. ◦ 📍 RSVP: http://lu.ma/office-hours-with-spare-cores ✨ Why attend? Outerbounds Office Hours are a space to: • 💡 Learn practical strategies from real-world Metaflow use cases • 🤝 Connect directly with the Outerbounds team & ML/infra community • 🧰 Discover patterns you can reuse in your own workflows • 🚀 Stay inspired by how others are solving cutting-edge challenges 🙋 Want to present? Presenting is a chance to showcase your work, get feedback from peers, and build visibility in the ML/infra ecosystem. Reach out if you’d like to share your story in a future session!
    🔥 1
    excited 1
    👀 3
    🎉 1
    👍 1
    0
  • b

    bland-garden-80695

    08/13/2025, 11:21 PM
    Hey, do metaflow support cycles? My 2 cents says no as it is a DAG. I want to test a reasoning/feedback based AI Agent, with below system. This system covers both conditions and looping.
    0
    c
    v
    b
    • 4
    • 13
  • w

    wide-butcher-68570

    08/13/2025, 4:07 PM
    Hey 👋 I am building a POC for my company to demo an Analytics Processing Workflow using Metaflow. The Steps in the flow require access to an existing S3. If I run the flow by just doing
    aws sso login
    and then run the flow; Metaflow is able to pickup the AWS credentials ok. But I am having trouble when trying to run the same flow using the
    metaflow-dev
    stack. Because the pods can't pickup the aws credentials from the process when I run the flow in
    metaflow-dev shell
    . Wondering if anyone has been able to mount aws credential Secrets from a local shell into pods spun up by
    metaflow-dev
    . I am trying to set up a good DevExp for the engineering org. Thanks!
    0
    • 1
    • 1