https://outerbounds.com/ logo
Join Slack
Powered by
  • b

    broad-accountant-50293

    05/09/2025, 7:10 PM
    Hey! I'm in DevOps but have proposed to the Data Science team at my company that we switch from Sagemaker to Metaflow hosted on EKS for better scaling and pipeline orchestration. I'm meeting with the Data Science team soon. Does anybody have a PowerPoint or anything they can share to increase my chances of convincing them 😅
    0
  • h

    hundreds-policeman-70711

    05/09/2025, 5:50 PM
    Hello, we have deployed Metaflow on GKE using Argo, but are having trouble with the Metaflow UI accessing the Metaflow Service (getting a 404 response). Hitting
    api.<our-service-dns>/ping
    returns
    pong
    (mentioned here) so it seems as though we are able to at least access the service. We are also able to run jobs, which we can track through Argo. Any idea what might be going on here? cc: @ripe-car-38698
    0
  • r

    ripe-dog-29417

    05/09/2025, 2:57 PM
    Hello, I am trying to use dynamic cards for my flow - I am building off of https://github.com/outerbounds/dynamic-card-examples/blob/main/sparklines-progress/sparklines.py. The issue is that my card does not get updated dynamically (I am on the 2.15.11 version of metaflow), in the start I see the table and the table gets updated only once in the end (intermediate steps are not visible)
    0
    n
    h
    • 3
    • 3
  • s

    straight-tiger-68114

    05/09/2025, 11:55 AM
    Hello, I am working to integrate metaflow and Argo using events. I have a webhook that captures events from running flows and forwards them on to other services and I also want external services to be able to trigger workflows via events. So far, I have 2 webhooks, one that is configured as the
    METAFLOW_ARGO_EVENTS_INTERNAL_WEBHOOK_URL
    for sending events from the flows to the outside world and another that is exposed for other services to push events into argo. Adding the
    @trigger
    decorator and trying to deploy the workflow returns an error
    Copy code
    An Argo Event name hasn't been configured for your deployment yet. Please see this article for more details on event names - <https://argoproj.github.io/argo-events/eventsources/naming/>. It is very likely that all events for your deployment share the same name. You can configure it by executing `metaflow configure kubernetes` or setting METAFLOW_ARGO_EVENTS_EVENT in your configuration. If in doubt, reach out for support at <http://chat.metaflow.org>
    My question is: If I configure the
    METAFLOW_ARGO_EVENTS_EVENT
    will that interfere with the existing outgoing events data flow? Alternatively, I could setup Sensors to trigger each flow, but I was hoping to keep this configuration piece in the metaflow realm.
    0
  • f

    fast-advantage-42097

    05/09/2025, 3:46 AM
    Hi, was looking around the API reference and came across the section on programmatic Runners. Does this integrate with Flows that are deployed via Step Functions? The docs make it sound like it ends up propagating to the equivalent of a CLI execution rather than Step Functions
    0
  • l

    lively-lunch-9285

    05/09/2025, 2:08 AM
    Me in a metaflow issue asking to make flow run diffs really awesome in the UI like ClearML and ZenML have done.
    👀 2
    0
    h
    • 2
    • 1
  • h

    hundreds-wire-22547

    05/08/2025, 7:19 PM
    for this example below, would it be possible to specify the k8s namespace to get the deployed flow from instead of relying on config?
    Copy code
    from metaflow import DeployedFlow
    
    # use the identifier saved above..
    deployed_flow = DeployedFlow.from_deployment(identifier=identifier)
    triggered_run = deployed_flow.trigger()
    0
  • h

    hundreds-football-74720

    05/08/2025, 10:10 AM
    Hi, Any help with this error and not getting the artifacts in metaflow UI:
    Copy code
    Traceback (most recent call last):
      File "/usr/local/bin/ui_backend_service", line 33, in <module>
        sys.exit(load_entry_point('metadata-service', 'console_scripts', 'ui_backend_service')())
      File "/root/services/ui_backend_service/ui_server.py", line 152, in main
        loop.run_forever()
      File "/usr/local/lib/python3.11/asyncio/base_events.py", line 607, in run_forever
        self._run_once()
      File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1922, in _run_once
        handle._run()
      File "/usr/local/lib/python3.11/asyncio/events.py", line 80, in _run
        self._context.run(self._callback, *self._args)
      File "/usr/local/lib/python3.11/site-packages/aiohttp/web_protocol.py", line 452, in _handle_request
        resp = await request_handler(request)
      File "/usr/local/lib/python3.11/site-packages/aiohttp/web_app.py", line 543, in _handle
        resp = await handler(request)
      File "/usr/local/lib/python3.11/site-packages/aiohttp/web_middlewares.py", line 114, in impl
        return await handler(request)
      File "/root/services/utils/__init__.py", line 89, in wrapper
        err_trace = getattr(err, 'traceback_str', None) or get_traceback_str()
      File "/root/services/utils/__init__.py", line 84, in wrapper
        return await func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/root/services/ui_backend_service/api/log.py", line 113, in get_task_log_stderr
        return await self.get_task_log(request, STDERR)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/root/services/ui_backend_service/api/log.py", line 220, in get_task_log
        lines, page_count = await read_and_output(self.cache, task, logtype, limit, page, reverse_order)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/root/services/ui_backend_service/api/log.py", line 261, in read_and_output
        raise LogException("Cache returned None for log content and raised no errors. \
    
    services.ui_backend_service.api.log.LogException: Cache returned None for log content and raised no errors.             The cache server might be experiencing issues.
    0
    h
    • 2
    • 1
  • a

    adorable-oxygen-86530

    05/06/2025, 3:43 PM
    Hey. Since metaflow 2.15.8+ we experience some interesting S3 errors on our setup
    2025-05-06 15:39:35.945 [759/start/6468 (pid 139695)] Transient S3 failure (attempt #1) -- total success: 2, last attempt 2/4 -- remaining: 2
    2025-05-06 15:39:39.125 [759/start/6468 (pid 139695)] Transient S3 failure (attempt #2) -- total success: 2, last attempt 0/2 -- remaining: 2
    2025-05-06 15:39:44.889 [759/start/6468 (pid 139695)] Transient S3 failure (attempt #3) -- total success: 2, last attempt 0/2 -- remaining: 2
    2025-05-06 15:39:53.515 [759/start/6468 (pid 139695)] Transient S3 failure (attempt #4) -- total success: 2, last attempt 0/2 -- remaining: 2
    which leads to a complete halt of the workflow eventually. Our setup consist of an on-prem Minio s3 instance and metaflow all running on a kubernetes cluster. Switching back to 2.15.7 resolves the error magically. Any ideas? Cheers
    0
    a
    h
    e
    • 4
    • 5
  • a

    able-battery-82852

    05/06/2025, 7:38 AM
    Ok cool! thanks for the pointers. I already started playing around with named envs, it works great
    0
  • d

    dry-beach-38304

    05/06/2025, 7:26 AM
    nice. Let me know if you have other questions. In case you hadn’t noticed, there is also a
    metaflow environemnt
    command which is somewhat useful 🙂 (
    metaflow environment --help
    will provide some more info). You also have
    myflow.py --environment=conda environment --help
    and lastly, I think I also have
    Runner("myflow.py", environment="conda").environment
    type of stuff (that last one may actually still be pending and not yet in the open version).
    0
  • a

    able-battery-82852

    05/06/2025, 7:23 AM
    Ok, no. but it all makes sense though! It's kind of useless otherwise to make use of isolated conda envs to then do something else. 🙂 (Unless you really have to) Named envs are really nice though, and is exactly the kind of flexibility I need.
    0
  • d

    dry-beach-38304

    05/06/2025, 7:21 AM
    technically you can use some system wide packages and the ones in your conda env. I wouldn’t necessarily recommend it fully and it’s definitely not the easiest but we do have something called the escape hatch which allows you to access package outside the conda environment from within it. The use case we have for this internally is to access company specific packages that are installed on all images from inside any conda environemnt. It typically involves authentication or access to certain resources. So that is possible but I wouldn’t use it for vanilla packages because it would be a pain to setup and probably won’t work in all cases 🙂.
    0
  • a

    able-battery-82852

    05/06/2025, 7:18 AM
    @dry-beach-38304, cool that you're looking into uv support. I honestly haven't used anything else for anything new in python really
    0
  • a

    able-battery-82852

    05/06/2025, 7:17 AM
    I really wanted both the system wide packages as well as the ones defined in my conda env. But yeah, I do understand the limitation though 🙂
    0
  • a

    able-battery-82852

    05/06/2025, 7:16 AM
    Hmm... no I think I am just really bad at reading the documentation. There's already a hint in my screenshot. It actually makes sense though, as conda runs in an isolated environment. What i'm trying to achieve is a custom docker image with base packages that I want to use within a flow step. That would work fine, but not when trying to combine it with @conda or @pypi. But that's even clearly stated in metaflow docs as well. It's one or the other really. But named environments seem really useful for my use case, so i'm probably going to end up using that
    0
  • d

    dry-beach-38304

    05/06/2025, 7:13 AM
    I really need to update the extension. Have more stuff internally I haven’t pushed out. Nothing super fancy but at least compatibility with conda v3 and mamba 2.0+. Oh, and uv support for installing stuff (so all in all mostly speed stuff)
    0
  • d

    dry-beach-38304

    05/06/2025, 7:13 AM
    it should work but maybe I missed something.
    0
  • d

    dry-beach-38304

    05/06/2025, 7:12 AM
    hey — what’s the error?
    0
  • a

    able-battery-82852

    05/06/2025, 5:24 AM
    I'm now using
    metaflow-nflx-ext
    and this works flawlessly! Just one question: It doesn't seem to be possible to combine a custom image with
    @batch
    and add additional libraries with
    @pypi
    or
    @conda
    . Is this because the extension handles these environments differently? According to the original metaflow docs this should be possible, so I'm wondering why it is not possible with the extension.
    0
  • e

    enough-article-90757

    05/05/2025, 9:57 PM
    Hey! I'm attempting to use the
    pypi
    decorator for package management, but when I run this in the
    metaflow-dev
    stack, I get an error saying that Metaflow can't find Conda artifacts. Has anyone seen this before, and is it expected? This is my invocation + output:
    Copy code
    ❯ python ubuntu_updates.py --environment=pypi run --with kubernetes                                                                                                                                                                                   [2/5008]
    Metaflow 2.15.10 executing UbuntuUpdatesFlow for user:coder
    Validating your flow...
        The graph looks good!
    Running pylint...
        Pylint not found, so extra checks are disabled.
    2025-05-05 21:50:24.808 Bootstrapping virtual environment(s) ...
    2025-05-05 21:50:24.881 Virtual environment(s) bootstrapped!
    2025-05-05 21:50:25.263 Workflow starting (run-id 4), see it in the UI at <http://localhost:3000/UbuntuUpdatesFlow/4>
    2025-05-05 21:50:25.673 [4/start/8 (pid 434281)] Task is starting.
    2025-05-05 21:50:26.493 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v] Task is starting (Pod is pending, Container is waiting - ContainerCreating)...
    2025-05-05 21:50:27.218 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v] Setting up task environment.
    2025-05-05 21:50:32.327 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v] Downloading code package...
    2025-05-05 21:50:32.938 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v] Code package downloaded.
    2025-05-05 21:50:32.977 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v] Task is starting.
    2025-05-05 21:50:33.839 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v] Bootstrapping virtual environment...
    2025-05-05 21:50:35.659 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v] Bootstrap failed while executing: set -e;
    2025-05-05 21:50:35.659 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v]             tmpfile=$(mktemp);
    2025-05-05 21:50:35.659 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v]             echo "@EXPLICIT" > "$tmpfile";
    2025-05-05 21:50:37.212 [4/start/8 (pid 434281)] Kubernetes error:
    2025-05-05 21:50:37.212 [4/start/8 (pid 434281)] Error: Setting up task environment.
    2025-05-05 21:50:37.212 [4/start/8 (pid 434281)] Downloading code package...
    2025-05-05 21:50:37.212 [4/start/8 (pid 434281)] Code package downloaded.
    2025-05-05 21:50:37.212 [4/start/8 (pid 434281)] Task is starting.
    2025-05-05 21:50:37.213 [4/start/8 (pid 434281)] Bootstrapping virtual environment...
    2025-05-05 21:50:37.213 [4/start/8 (pid 434281)] Bootstrap failed while executing: set -e;
    2025-05-05 21:50:37.213 [4/start/8 (pid 434281)] tmpfile=$(mktemp);
    2025-05-05 21:50:37.213 [4/start/8 (pid 434281)] echo "@EXPLICIT" > "$tmpfile";
    2025-05-05 21:50:37.213 [4/start/8 (pid 434281)] ls -d /metaflow/.pkgs/conda// >> "$tmpfile";
    2025-05-05 21:50:37.345 [4/start/8 (pid 434281)] export PATH=$PATH:$(pwd)/micromamba;
    2025-05-05 21:50:37.346 [4/start/8 (pid 434281)] export CONDA_PKGS_DIRS=$(pwd)/micromamba/pkgs;
    2025-05-05 21:50:37.346 [4/start/8 (pid 434281)] export MAMBA_NO_LOW_SPEED_LIMIT=1;
    2025-05-05 21:50:37.346 [4/start/8 (pid 434281)] export MAMBA_USE_INDEX_CACHE=1;
    2025-05-05 21:50:37.346 [4/start/8 (pid 434281)] export MAMBA_NO_PROGRESS_BARS=1;
    2025-05-05 21:50:37.346 [4/start/8 (pid 434281)] export CONDA_FETCH_THREADS=1;
    2025-05-05 21:50:37.346 [4/start/8 (pid 434281)] micromamba create --yes --offline --no-deps                 --safety-checks=disabled --no-extra-safety-checks                 --prefix /metaflow/linux-64/f35ade658f6977a --file "$tmpfile"                 -
    -no-pyc --no-rc --always-copy;
    2025-05-05 21:50:37.346 [4/start/8 (pid 434281)] rm "$tmpfile"
    2025-05-05 21:50:37.346 [4/start/8 (pid 434281)] Stdout:
    2025-05-05 21:50:37.346 [4/start/8 (pid 434281)] Stderr: ls: cannot access '/metaflow/.pkgs/conda//': No such file or directory
    2025-05-05 21:50:37.346 [4/start/8 (pid 434281)]
    2025-05-05 21:50:37.346 [4/start/8 (pid 434281)] (exit code 1). This could be a transient error. Use @retry to retry.
    2025-05-05 21:50:37.346 [4/start/8 (pid 434281)]
    2025-05-05 21:50:35.659 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v]             ls -d /metaflow/.pkgs/conda/*/* >> "$tmpfile";
    2025-05-05 21:50:35.659 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v]             export PATH=$PATH:$(pwd)/micromamba;
    2025-05-05 21:50:35.659 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v]             export CONDA_PKGS_DIRS=$(pwd)/micromamba/pkgs;
    2025-05-05 21:50:35.659 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v]             export MAMBA_NO_LOW_SPEED_LIMIT=1;
    2025-05-05 21:50:35.659 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v]             export MAMBA_USE_INDEX_CACHE=1;
    2025-05-05 21:50:35.659 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v]             export MAMBA_NO_PROGRESS_BARS=1;
    2025-05-05 21:50:35.659 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v]             export CONDA_FETCH_THREADS=1;
    2025-05-05 21:50:35.659 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v]             micromamba create --yes --offline --no-deps                 --safety-checks=disabled --no-extra-safety-checks                 --prefix /metaflow/linux-64/f35ade658f
    6977a --file "$tmpfile"                 --no-pyc --no-rc --always-copy;
    2025-05-05 21:50:35.659 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v]             rm "$tmpfile"
    2025-05-05 21:50:35.659 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v] Stdout:
    2025-05-05 21:50:35.659 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v] Stderr: ls: cannot access '/metaflow/.pkgs/conda/*/*': No such file or directory
    2025-05-05 21:50:35.659 [4/start/8 (pid 434281)] [pod t-840c2468-jfv52-wt56v]
    2025-05-05 21:50:37.362 [4/start/8 (pid 434281)] Task failed.
    2025-05-05 21:50:37.381 Workflow failed.
    2025-05-05 21:50:37.382 Terminating 0 active tasks...
    2025-05-05 21:50:37.382 Flushing logs...
        Step failure:
        Step start (task-id 8) failed.
    Any info would be useful, thanks!!
    👀 1
    0
  • w

    white-helicopter-28706

    05/02/2025, 5:25 PM
    My framework is the following: Raw Data (Quartr API) → Process Data for modeling (sentiment analysis) → Generate Inference → frontend Consumption I’m thinking on using Metaflow Artifacts and S3 Client is the following way: 1. Raw Data Collection (Quartr API) -> Use Built-in Artifacts 2. Processing for Modeling -> Use Built-in Artifacts 3. Writing Outputs (Inference) S3 client. (store them in S3) 4. Dashboard Consumption FastAPI backend Does it make sense to use s3 for data collection/ fetching as well?
    0
  • p

    prehistoric-waiter-14304

    05/02/2025, 3:19 PM
    Using a deployer object in a metaflow flow, I'm running into the following pylint error. This occurs with versions
    2.15.0
    as well as
    2.15.9
    Copy code
    E1101: Instance of 'Deployer' has no 'step_functions' member (no-member)
    It doesn't seem like this should be happening? https://docs.metaflow.org/api/deployer#Deployer.step_functions
    ✅ 1
    0
    a
    • 2
    • 1
  • e

    elegant-painter-46407

    05/01/2025, 12:21 PM
    Hey I noticed there is an ArgoClient provided by the metaflow team for submitting argo workflows. Now, the issue for me is how the parameters are used by metaflow-originated argo workflows as opposed to native argo workflows. --- What I am trying to pass to is a
    string
    . This string is a comma separated list of string which will splitted based on
    ,
    by the flow (exact same method used) The method simply does:
    Copy code
    out_list = [out.rstrip().lstrip() for out in input_string.split(delimeter)]
        return out_list
    --- Now, I have two flows: 1. A metaflow flow that has been converted to an argo workflow e.g
    python parameter_flow.py --with retry argo-workflows create
    . When using the argo client to submit the workflow with
    parameters = {"comma_sep_list_of_strings": "string1,string2"}
    the application is successfully splitting this into a list 2. In contrast. when submitting the native argo workflow with the same parameter
    parameters = {"comma_sep_list_of_strings": "string1,string2"}
    , the list will end up being
    ['"string1', 'string2"']
    • The default value for this
    comma_sep_list_of_strings
    parameter in both cases is
    ""
    • (Json file) The representation of the default value for the parameter by the metaflow created argo workflow is
    "\"\""
    • (Json file) The representation of the default value for the parameter by the metaflow created argo workflow is
    ""
    (from the ArgoUI) It should be noted that I don't have the same issue when submitting the native argo workflow from the UI. Is there anything I am missing on how to submit the native argo workflow? Thanks
    0
    a
    • 2
    • 1
  • r

    rhythmic-beach-70913

    04/30/2025, 9:15 AM
    Hitting the dreaded 8192 character limit with step functions again (ref https://github.com/Netflix/metaflow/issues/1482) One thing which would help is to remove the
    "note": "Internal representation of IncludeFile(…)"
    from the parameters Is there any reason not to do that? (I can’t see how it could be used anyway), or at least allow it to be set rather than always defaulting The main problem is the length of
    Copy code
    "ContainerOverrides": {
                                                "Command": [
                                                    "bash",
                                                    "-c",
                                                    "true && mkdir -p $PWD/.logs && ....
    0
    h
    • 2
    • 3
  • h

    hallowed-soccer-94479

    04/28/2025, 8:48 PM
    hi are there any docs on how to use the
    @checkpoint
    decorator when using
    @parallel
    steps? The docs I found here say TODO https://github.com/outerbounds/metaflow-checkpoint-examples/blob/master/documentation/checkpoint_deco/checkpoint_usage.md#saving--loa[…]rallel-steps
    ✅ 1
    0
    a
    h
    • 3
    • 4
  • c

    cold-balloon-7686

    04/25/2025, 6:43 PM
    Hey, I'm trying to find a programmatic way of running Metaflow using Argo Workflows using NBRunner. I tried the following without success:
    Copy code
    from metaflow import FlowSpec, step, NBRunner
    
    
    class HelloFlow(FlowSpec):
        @step
        def start(self):
            self.x = 1
            self.next(self.end)
    
        @step
        def end(self):
            self.x += 1
            print("Hello world! The value of x is", self.x)
    
    
    run = NBRunner(HelloFlow).nbrun(decospecs=["argo-workflows"])
    • Any idea how to achieve this? • Any idea how to default to using Argo Workflows? Thanks
    ✅ 1
    0
    b
    • 2
    • 4
  • a

    able-battery-82852

    04/25/2025, 7:45 AM
    Anyone has any idea if this can be fixed in any way? Right now resorting back to
    @conda
    . Whether run locally on arm64 or pushing it to batch, both give the same error
    0
    d
    h
    • 3
    • 11
  • m

    mammoth-rainbow-82717

    04/25/2025, 7:07 AM
    Hi I was wondering about the order of the steps within a
    foreach
    loop on Kubernetes/Argo. Will the order of the steps respect the order of the list provided to the
    foreach
    ? Or the order can be random somehow? TIA
    ✅ 1
    0
    v
    • 2
    • 11
  • a

    acoustic-van-30942

    04/24/2025, 6:58 PM
    Hello team, How do we delete old pipeline runs on Metaflow and remove the data from s3?
    ✅ 1
    0
    s
    • 2
    • 4