https://flyte.org logo
Join Slack
Powered by
# flyte-deployment
  • b

    bumpy-cartoon-62963

    01/23/2025, 3:49 PM
    Hi I wanted to add support for Nebius AI Cloud to deploy Flyte. What do you think? Is there a guide, or can I create a PR to add support?
    a
    f
    l
    • 4
    • 7
  • g

    gorgeous-caravan-46442

    01/28/2025, 7:50 PM
    Hey everyone, I'm trying the single-cluster deployment of flyte on EKS using the AWS CDK (constrained by work). I'm following "flyte: the hard way". Has anyone done this before? I'll put details of where I'm stuck in the thread
    a
    • 2
    • 73
  • r

    ripe-smartphone-56353

    01/29/2025, 1:14 PM
    What is the recommended update procedure for flyte-core? I'm trying to upgrade from 1.13.2 to 1.14.1 - it seems there are some database migrations involved. On our first try we let the new flyteadmin/run-migrations container run for ~35 minutes until we decided to roll back. Is it recommended to shutdown all other flyte related workloads while migrations are run? Is it just expected to take longer than ~30 minutes for these migrations to finish? There also seems to be no logging in the run-migrations container - at least not at log level 3.
    f
    h
    a
    • 4
    • 11
  • s

    shy-morning-17240

    02/10/2025, 8:37 PM
    Hi all, I'v successfully deployed flyte-core onto an on-prem kubernetes cluster and I'm able to run simple CPU only workflows as well as multi-node, multi-gpu workflows
  • s

    shy-morning-17240

    02/10/2025, 8:38 PM
    however, whenever my scripts generate a file or directory, which I then try to output from the tasks as a FlyteFile or FlyteDirectory object, I get the following exception:
  • s

    shy-morning-17240

    02/10/2025, 8:39 PM
    Failed to convert type <class 'flytekit.types.file.file.FlyteFile.__class_getitem__.<locals>._SpecificFormatClass'> to type <class 'flytekit.types.file.file.FlyteFile.__class_getitem__.<locals>._SpecificFormatClass'>. Error Message: Access Denied..
  • s

    shy-morning-17240

    02/10/2025, 8:50 PM
    I've checked that the account I configured in the "storage:" section of the configuration file has has read/write permissions in the s3 bucket, so I'm not sure why this is happening. Could this be related to the default service account used to run the workflow?
    a
    • 2
    • 5
  • p

    purple-father-70173

    02/10/2025, 10:50 PM
    Hi, I'm having some issues with submitting my first RayJob with Flyte 🧡
    g
    • 2
    • 7
  • t

    thousands-airline-37812

    02/12/2025, 2:44 PM
    Hi, I need to run Selenium in my Flyte workflow on Flyte. But since the selenium library is not installed, it gives the following error. Please help me.
    Copy code
    Trace:
    
        Traceback (most recent call last):
          File "/usr/local/lib/python3.10/site-packages/flytekit/bin/entrypoint.py", line 164, in _dispatch_execute
            task_def = load_task()
          File "/usr/local/lib/python3.10/site-packages/flytekit/bin/entrypoint.py", line 583, in load_task
            return resolver_obj.load_task(loader_args=resolver_args)
          File "/usr/local/lib/python3.10/site-packages/flytekit/core/utils.py", line 312, in wrapper
            return func(*args, **kwargs)
          File "/usr/local/lib/python3.10/site-packages/flytekit/core/python_auto_container.py", line 271, in load_task
            task_module = importlib.import_module(name=task_module)  # type: ignore
          File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
            return _bootstrap._gcd_import(name[level:], package, level)
          File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
          File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
          File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
          File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
          File "<frozen importlib._bootstrap_external>", line 883, in exec_module
          File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
          File "/root/browser_test.py", line 2, in <module>
            from selenium import webdriver
        ModuleNotFoundError: No module named 'selenium'
    
    Message:
    
        ModuleNotFoundError: No module named 'selenium'
    Copy code
    from flytekit import task, workflow
    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service
    from webdriver_manager.chrome import ChromeDriverManager
    import time
    import subprocess
    
    @task
    def open_google() -> str:
        install_selenium()
        # # # service = Service('/usr/local/bin/chromedriver')
        # # # options = webdriver.ChromeOptions()
        # # # driver = webdriver.Chrome(service=service, options=options)
        driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
        driver.get("<https://www.google.com>")
        time.sleep(3)
        title = driver.title
        driver.quit()
        return f"Sayfa başlığı: {title}"
    
    @workflow
    def browser_test_wf() -> str:
        return open_google()
    
    if __name__ == "__main__":
        print(browser_test_wf())
    s
    • 2
    • 4
  • p

    proud-apple-49696

    02/13/2025, 3:42 PM
    Hello, I'm trying to build a project using Flyte and Selenium, but I've encountered an issue. I initially developed my project using regular Python Selenium, and it works fine. The workflow is quite simple: 1. It navigates to a specified URL. 2. Clicks the login button and fills in the necessary fields. 3. Solves the CAPTCHA. 4. Executes a data search function. 5. Saves the listed data as a CSV file and moves to the next page, repeating the same process. Everything works as expected up to this point. However, when I try to migrate these operations to Flyte as separate tasks, I run into an issue: it seems that WebDriver objects cannot be shared between tasks, which causes an error. I don't want to write all my code in a single script; instead, I aim to build a modular structure. However, there's no continuity between tasks. For example, after the login task, I need to continue without closing the browser. Once logged in, I should be able to execute the search function and then the data extraction function. But I can't seem to establish such a workflow in Flyte. How can I solve this issue? Any guidance would be greatly appreciated!
    f
    • 2
    • 8
  • a

    alert-kitchen-27022

    02/17/2025, 5:18 PM
    Hello Everyone, I recently deployed flyte to a kuberneted cluster and I am able to access the console page without any issues. The problem is when I try to use flyte remote, I keep getting grpc connection errors:
    Copy code
    /opt/homebrew/lib/python3.11/site-packages/flytekit/clients/grpc_utils/wrap_exception_interceptor.py:32 in _raise_if_exc                                                                                                                                                                                                     ────────────────────────────────────────────────────────────────────────╯
    FlyteSystemUnavailableException: Flyte cluster is currently unavailable. Please make sure the cluster is up and running.
    WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
    E0000 00:00:1739812143.306775 1021532 <http://init.cc:229]|init.cc:229]> grpc_wait_for_shutdown_with_timeout() timed out.
    Any pointers to an existing issue like this or documentation reference will be helpful. Thank you
    a
    • 2
    • 11
  • t

    thousands-airline-37812

    02/18/2025, 11:31 AM
    Copy code
    from flytekit import ImageSpec, Resources, task, workflow
    from playwright.sync_api import sync_playwright
    
    install_commands = [
        "apt-get update",
        "apt-get install -y libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libxkbcommon0 libpango-1.0-0 libcairo2 libasound2 libatspi2.0-0",
        "python -m pip install --upgrade pip",
        "python -m pip install playwright==1.50.0",
        "python -m playwright install chromium",
    ]
    
    image_spec = ImageSpec(
        name="playwright-flyte",
        base_image="<http://ghcr.io/flyteorg/flytekit:py3.10-1.15.0|ghcr.io/flyteorg/flytekit:py3.10-1.15.0>",
        packages=["playwright==1.50.0"],
        registry="localhost:30000",
        commands=install_commands
    )
    
    @task(container_image=image_spec)
    def test_google_title() -> str:
        
        with sync_playwright() as playwright:
            browser = playwright.chromium.launch(headless=True)
            page = browser.new_page()
            
            # Google ana sayfasΔ±na git
            page.goto("<https://www.google.com>")
            
            # Sayfa başlığını al
            title = page.title()
            
            # TarayΔ±cΔ±yΔ± kapat
            browser.close()
            
            return title
    
    @workflow
    def browser_test_wf() -> str:
        return test_google_title()
    
    if __name__ == "__main__":
        print(browser_test_wf())
    I want to try run above code. But i am getting below error at kubernetes logs. I didn't solve problem . Please help me
    Copy code
    Traceback (most recent call last):
      File "/opt/micromamba/envs/runtime/bin/pyflyte-fast-execute", line 4, in <module>
        from flytekit.bin.entrypoint import fast_execute_task_cmd
      File "/opt/micromamba/envs/runtime/lib/python3.10/site-packages/flytekit/__init__.py", line 222, in <module>
        from flytekit.core.array_node_map_task import map_task
      File "/opt/micromamba/envs/runtime/lib/python3.10/site-packages/flytekit/core/array_node_map_task.py", line 15, in <module>
        from flytekit.core.array_node import array_node
      File "/opt/micromamba/envs/runtime/lib/python3.10/site-packages/flytekit/core/array_node.py", line 6, in <module>
        from flytekit.core import interface as flyte_interface
      File "/opt/micromamba/envs/runtime/lib/python3.10/site-packages/flytekit/core/interface.py", line 25, in <module>
        from flytekit.core import context_manager
      File "/opt/micromamba/envs/runtime/lib/python3.10/site-packages/flytekit/core/context_manager.py", line 32, in <module>
        from flytekit.core.data_persistence import FileAccessProvider, default_local_file_access_provider
      File "/opt/micromamba/envs/runtime/lib/python3.10/site-packages/flytekit/core/data_persistence.py", line 671, in <module>
        data_config=DataConfig.auto(),
      File "/opt/micromamba/envs/runtime/lib/python3.10/site-packages/flytekit/configuration/__init__.py", line 657, in auto
        config_file = get_config_file(config_file)
      File "/opt/micromamba/envs/runtime/lib/python3.10/site-packages/flytekit/configuration/file.py", line 259, in get_config_file
        if current_location_config.exists():
      File "/opt/micromamba/envs/runtime/lib/python3.10/pathlib.py", line 1290, in exists
        self.stat()
      File "/opt/micromamba/envs/runtime/lib/python3.10/pathlib.py", line 1097, in stat
        return self._accessor.stat(self, follow_symlinks=follow_symlinks)
    PermissionError: [Errno 13] Permission denied: 'flytekit.config'
    s
    • 2
    • 3
  • t

    thousands-airline-37812

    02/20/2025, 8:54 AM
    Hello, I started my workflow with the local docker flyte runtime that I started with flytectl demo start. What settings do I need to make to run this at production level? Is there any documentation?
    s
    a
    • 3
    • 3
  • s

    shy-morning-17240

    02/21/2025, 12:03 AM
    I have a very odd issue. I've configured my flyte client (flytectl) such that it points to my flyte-core deployment's admin/console ingresses. My goal is to not have to do any port-forwarding for minio nor flyteadmin. After configuring my console/admin http and grpc ingresses, I run
    flytectl get projects
    and successfully get my projects in my cluster
    Copy code
    --------------- --------------- --------------------------- 
    | ID (4)        | NAME          | DESCRIPTION               |
     --------------- --------------- --------------------------- 
    | #####        | #####        | #############             |
     --------------- --------------- --------------------------- 
    | flyteexamples | flyteexamples | flyteexamples description |
     --------------- --------------- --------------------------- 
    | flytetester   | flytetester   | flytetester description   |
     --------------- --------------- --------------------------- 
    | flytesnacks   | flytesnacks   | flytesnacks description   |
     --------------- --------------- --------------------------- 
    4 rows
    I can also reach my minio instance in the cluster without port-forwarding. However, when I try to run python code on the remote server using
    pyflyte -v run --remote -p ###### -d development  some_example.py some_workflow
    , I get the following error
    Copy code
    ...
    _InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
            status = StatusCode.UNAVAILABLE
            details = "DNS resolution failed for flyte.somedomain.com: C-ares status is not ARES_SUCCESS qtype=A 
    name=flyte.somedomain.com is_balancer=0: Timeout while contacting DNS servers"
            debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"DNS resolution failed for flyte.somedomain.com: 
    C-ares status is not ARES_SUCCESS qtype=A name=flyte.somedomain.com is_balancer=0: Timeout while contacting DNS servers", 
    grpc_status:14, created_time:"2025-02-20T15:43:24.043117-08:00"}"
    ...
    FlyteSystemUnavailableException: Flyte cluster is currently unavailable. Please make sure the cluster is up and running.
    ...
    RuntimeError: Failed to get signed url for fastec41b71eb12116e44b8039fa355d3577.tar.gz.
    All the pods are running normally, and when configuring flyte client to use port-forwarded address, everything works fine. I also checked to see if I could find my configured hostname from a test pod running in the same cluster (running
    host <http://flyte.somedomain.com|flyte.somedomain.com>
    ) and I get back the flyteadmin's pod IP, so it doesn't seem to be an issue with an internal cluster DNS lookup Any ideas as to why flytectl can talk to the flyteadmin pod running in the cluster, but pyflyte fails??
  • a

    average-finland-92144

    02/21/2025, 1:04 PM
    Seems to be known issue with gRPC dns resolver and here's a workaround(https://github.com/grpc/grpc/issues/19954#issuecomment-2468374813) What version of flytekit are you on?
    s
    • 2
    • 3
  • f

    fierce-farmer-40956

    02/21/2025, 1:19 PM
    Hello Team, I am configuring flyte from scratch, using the helm chart, in a GKE cluster with Istio and Google IAP. I am running into the issues described in https://github.com/flyteorg/flyte/issues/6089
    f
    • 2
    • 4
  • s

    shy-morning-17240

    02/21/2025, 5:46 PM
    In trying to get pyflyte to work through http connections only (no port-forwarding from cluster), I'm now getting the following HTTP 400 error from pyflyte register:
    Copy code
    Response: <?xml version="1.0" encoding="UTF-8"?>
    <Error><Code>BadDigest</Code><Message>The Content-Md5 you specified did not match what we 
    received.</Message><Key>flytesnacks/development/CB66MYDKJETQMEK6WZSLGPS3AM======/fastec41b71eb12116e44b8039fa355d3577.tar.gz</Key><Buck
    etName>flyte</BucketName><Resource>/flyte/flytesnacks/development/CB66MYDKJETQMEK6WZSLGPS3AM======/fastec41b71eb12116e44b8039fa355d3577
    .tar.gz</Resource><RequestId>18264962F16175E9</RequestId><HostId>dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8</Host
    Id></Error>
    I believe this is a race-condition type error (but I might be wrong). Any idea why this might happen, especially because it doesn't happen when port forwarding both flyte-admin and minio? (I also noticed this happens when port forwarding flyte-admin but using kubernetes ingress for minio)
    • 1
    • 1
  • s

    shy-morning-17240

    02/25/2025, 5:24 PM
    Any reason why the flyte on-prem docs require users to port-forward blob-storage service, but normal cloud deployment path don't require port-forwarding the storage service (only flyteadmin/console). I'm doing an on-prem deployment but can't figure out why my deployment works normally if I port-forward the services but I get MD5 checksum error if I try to do everything through ingresses.
    • 1
    • 1
  • t

    thousands-airline-37812

    02/27/2025, 1:41 PM
    hello everyone, pyflyte run --remote -p my-project -d development my-project.py browser_test_wf by running it like this. I can run my workflow on my local docker flyte runtime. How can I run this on my computer on the docker flyte runtime installed on my server?
    h
    a
    • 3
    • 2
  • p

    proud-answer-87162

    04/16/2025, 8:10 PM
    hi all - i'm sure this has been asked before, but i can't find a firm answer. it looks like flyte-core has a config block that allows you to create custom domains:
    Copy code
    # -- Domains configuration for Flyte projects. This enables the specified number of domains across all projects in Flyte.
      domain:
        domains:
          - id: development
            name: development
          - id: staging
            name: staging
          - id: production
            name: production
    is there a way to do that using flyte-binary? anyone ever look to see how much work would be required to extend the chart to support the config?
    a
    • 2
    • 1
  • f

    fierce-farmer-40956

    04/22/2025, 5:54 PM
    hello, in flyte, is there a way to get information about the user who submitter the run?
    w
    f
    c
    • 4
    • 11
  • f

    fierce-farmer-40956

    05/02/2025, 3:53 PM
    I see 3 types of prometheus metrics, admin, propeller and console. I then see that one is meant to rely on kube_state_metrics metrics of pod to get other information from the exactual execution (like cpu usage etc) Are there some metrics exposed directly by the running pods that we should also collect? or which other ways are there to collect metrics on running executions?
    a
    w
    • 3
    • 9
  • p

    purple-boots-20156

    05/28/2025, 2:02 PM
    Hello everyone, I'm trying to wrap my head around how to approach green field deployment of flyte in on-prem environment - everything should stay on-prem. We have 50TB of training data, multiple training servers with 100TB of local storage each, we also have netapp network storage with 500TB. Link between those are at least 100G. Any ideas how to approach such setup in most efficient way and avoid bottlenecks caused by data transfer? Probably data streaming would obvious answer, but I'm curious how do handle use cases when so much data is involved.
    f
    t
    a
    • 4
    • 10
  • v

    victorious-rainbow-91536

    06/06/2025, 11:16 AM
    @victorious-rainbow-91536 has left the channel
  • v

    victorious-rainbow-91536

    06/06/2025, 11:21 AM
    @victorious-rainbow-91536 has left the channel
  • p

    purple-father-70173

    06/18/2025, 7:08 PM
    Hi, I'm having some issues with Flyte Logging Links for the Ray Plugin. I'm using Flyte-binary 1.15.3 and I currently use Grafana as a log viewer with Loki. I don't see any information in the documentation for how to configure dynamic logging links via custom decorator using flyte-binary (assuming that's the right way to do this). Does anyone have any experience setting up logging links for Ray?
    f
    g
    • 3
    • 5
  • h

    high-park-16144

    07/04/2025, 12:32 PM
    Hi all! I am trying to setup OIDC for flyte-core via Keycloak, and getting stucked in flytePropeller config. As i can see from docs, flytePropeller need scope offline, but in keycloak it named as offline_access. How to change this behaviour? In thirdPartyConfig we can change this settings for flyteClient, but not for flytePropeller.
    g
    g
    • 3
    • 7
  • w

    worried-airplane-87065

    07/07/2025, 5:59 PM
    For folks deploying their own Flyte instance, how are you measuring reliability of the deployment?
  • s

    shy-morning-17240

    07/25/2025, 11:35 AM
    Hi, I'm trying to add authentication to Flyte deployment, and after configuring oauth, all my pods start-up normally, except for flytescheduler pod, which errors out with:
    Copy code
    panic: authentication error! Original Error: <nil>, Auth Error: failed to issue token. Error: failed to get new token: failed to get new token: oauth2: "invalid_client" "Client authentication failed (e.g., unknown client, no client authentication included, or unsupported authentication method)."
    Why could this be happening?
    a
    • 2
    • 4
  • b

    brainy-raincoat-7497

    07/30/2025, 4:11 PM
    Hi flyters, we are planning to run our flyte cluster on lambda cloud. lambda cloud is where we can rent GPU instances/clusters such as A10, A100, H100 etc., or even clusters. Can you please recommend a guide/method to set up Flyte with K8S??
    a
    • 2
    • 4