glamorous-carpet-83516
09/29/2024, 7:21 AMfreezing-airport-6809
steep-jackal-21573
09/30/2024, 7:44 AMdry-pizza-97077
09/30/2024, 8:47 AMalert-oil-1341
10/01/2024, 12:08 AMflyteagent
dashboards people are using to monitor the agents? We're just using default k8s deployment monitoring now, but would be nice to have something a bit more agent specifichigh-park-82026
- <something>
inside them are considered maps) are merged additvely (existing keys are merged not replaced). Lists, however, are replaced (so if you have a list of 2 elements in one file and 3 elements in another, the last one wins with those 3 elements in the final config). I could be wrong though, I've not touched this code in a while. Can you show me an exact config map you are hoping to use? the one posted above doesn't have any overlap I could see...billowy-church-83438
10/02/2024, 10:40 PMbillowy-church-83438
10/02/2024, 10:40 PM/etc/flyte/config
(from configmap)
[flyte@flytepropeller-f6d98fbd8-bqdgc /]$ ls -ls /etc/flyte/config/
total 0
0 lrwxrwxrwx 1 root 65534 17 Oct 2 04:54 admin.yaml -> ..data/admin.yaml
0 lrwxrwxrwx 1 root 65534 25 Oct 2 04:54 agent_service.yaml -> ..data/agent_service.yaml
0 lrwxrwxrwx 1 root 65534 17 Oct 2 04:54 cache.yaml -> ..data/cache.yaml
0 lrwxrwxrwx 1 root 65534 19 Oct 2 04:54 catalog.yaml -> ..data/catalog.yaml
0 lrwxrwxrwx 1 root 65534 19 Oct 2 04:54 copilot.yaml -> ..data/copilot.yaml
0 lrwxrwxrwx 1 root 65534 16 Oct 2 04:54 core.yaml -> ..data/core.yaml
0 lrwxrwxrwx 1 root 65534 27 Oct 2 04:54 enabled_plugins.yaml -> ..data/enabled_plugins.yaml
0 lrwxrwxrwx 1 root 65534 15 Oct 2 04:54 k8s.yaml -> ..data/k8s.yaml
0 lrwxrwxrwx 1 root 65534 20 Oct 2 04:54 kingkong.yaml -> ..data/kingkong.yaml
0 lrwxrwxrwx 1 root 65534 18 Oct 2 04:54 logger.yaml -> ..data/logger.yaml
0 lrwxrwxrwx 1 root 65534 16 Oct 2 04:54 mufn.yaml -> ..data/mufn.yaml
0 lrwxrwxrwx 1 root 65534 28 Oct 2 04:54 resource_manager.yaml -> ..data/resource_manager.yaml
0 lrwxrwxrwx 1 root 65534 19 Oct 2 04:54 storage.yaml -> ..data/storage.yaml
0 lrwxrwxrwx 1 root 65534 21 Oct 2 04:54 task_logs.yaml -> ..data/task_logs.yaml
• centralized agent-service.yaml
[flyte@flytepropeller-f6d98fbd8-bqdgc /]$ cat /etc/flyte/config/agent_service.yaml
plugins:
agent-service:
defaultAgent:
endpoint: flyteagent:8000
insecure: true
supportedTaskTypes:
- sensor
• centralized enabled_plugins
[flyte@flytepropeller-f6d98fbd8-bqdgc /]$ cat /etc/flyte/config/enabled_plugins.yaml
plugins:
agent-service:
supportedTaskTypes:
- sensor
tasks:
task-plugins:
default-for-task-types:
container: container
container_array: k8s-array
echo: echo
mpi: mpi
pytorch: pytorch
ray: ray
sensor: agent-service
sidecar: sidecar
tensorflow: tensorflow
enabled-plugins:
- container
- sidecar
- k8s-array
- tensorflow
- mufn
- mpi
- pytorch
- ray
- echo
- agent-service
billowy-church-83438
10/02/2024, 10:40 PMagent-service
plugins:
agent-service:
agents:
custom-agent:
endpoint: custom_agent_end_point_FQDN:8000
insecure: true
supportedTaskTypes:
- custom_sensor
- customtask
• Custom plugins related (minimum)
plugins:
agent-service:
supportedTaskTypes:
- customsensor
- customtask
tasks:
task-plugins:
default-for-task-types:
customtask: agent-service
customsensor: agent-service
# OTHER TASKS are configured by the centralized team of flyte team
freezing-airport-6809
alert-oil-1341
10/07/2024, 2:45 PM@workflow
def locking_wf() -> None:
acquired, lock_handle = acquire_lock(lock_handle="my_lock", duration_in_seconds=10)
cond = (
conditional("test lock")
.if_(acquired.is_true())
.then(task1(lock_handle))
.else_()
.then(task2())
)
cond > release_lock(lock_handle=lock_handle)
Where acquire_lock
and release_lock
are Agents responsible for maintaining the lock. Curious if others have similar use-cases and or implementations?alert-oil-1341
10/09/2024, 5:26 PMmy.agent
and my.v2.agent
and also set the task names uniquely, i.e. my_task
and my_task_v2
. Any other thoughts?brief-window-55364
10/21/2024, 2:05 PMapiVersion: v1
data:
config.yaml: |
plugins:
agent-service:
agents:
mockAgent:
endpoint: 'dns:///flyte-mock-agent'
insecure: true
defaultServiceConfig: '{"loadBalancingConfig": [{"round_robin":{}}]}'
timeouts:
ExecuteTaskSync: 300s
GetTask: 100s
defaultTimeout: 30s
kind: ConfigMap
Note: the lack of defaultAgent
But my executions are failing with [1/1] currentAttempt done. Last Error: USER::failed to get grpc connection with error: failed to exit idle mode: passthrough: received empty target in Build()
which seems like it's not picking up the config correctly (even tough I know it's doing the listing just fine) .
Do I need to specific the defaultAgent field?brief-window-55364
10/22/2024, 9:31 AMdamp-lion-88352
10/24/2024, 4:46 AMContainerError
in Flytekit's agent service, and I completely agree with the suggestion. I'd love to hear what others think about it.
For reference:
- ContainerError: https://github.com/flyteorg/flyte/blob/master/flyteidl/protos/flyteidl/core/errors.proto#L13-L24
- Agent service: https://github.com/flyteorg/flytekit/blob/master/flytekit/extend/backend/agent_service.pyalert-oil-1341
11/08/2024, 9:12 PMAsyncAgentExecutorMixin
code, the delete method is scheduled in the signal_handler, which is only called when a signal SIGINT is emitted. This is preventing resource clean up as part of normal execution (without any interruptions) and is preventing one of our test cases from progressing.
What are thoughts on simply making this a final step in the execution, ensuring we get to that code every time? Something like:
def execute(self: PythonTask, **kwargs) -> LiteralMap:
ctx = FlyteContext.current_context()
ss = ctx.serialization_settings or SerializationSettings(ImageConfig())
output_prefix = ctx.file_access.get_random_remote_directory()
from flytekit.tools.translator import get_serializable
task_template = get_serializable(OrderedDict(), ss, self).template
self._agent = AgentRegistry.get_agent(task_template.type, task_template.task_type_version)
resource_meta = asyncio.run(
self._create(task_template=task_template, output_prefix=output_prefix, inputs=kwargs)
)
try:
# Main execution logic
resource = asyncio.run(self._get(resource_meta=resource_meta))
if resource.phase != TaskExecution.SUCCEEDED:
raise FlyteUserException(f"Failed to run the task {self.name} with error: {resource.message}")
# Process outputs
if task_template.interface.outputs and resource.outputs is None:
local_outputs_file = ctx.file_access.get_random_local_path()
ctx.file_access.get_data(f"{output_prefix}/outputs.pb", local_outputs_file)
output_proto = utils.load_proto_from_file(literals_pb2.LiteralMap, local_outputs_file)
return LiteralMap.from_flyte_idl(output_proto)
if resource.outputs and not isinstance(resource.outputs, LiteralMap):
return TypeEngine.dict_to_literal_map(ctx, resource.outputs)
return resource.outputs
finally:
# Cleanup logic that runs after the try block, even if it returns or raises an exception
try:
asyncio.run(self._agent.delete(resource_meta=resource_meta))
except Exception as e:
logger.error(f"Error during resource cleanup: {e}")
damp-lion-88352
11/13/2024, 1:31 PMContainerError
for this. The reason is that most agent failures are due to backend API errors, not issues within its container. What do you all think?
cc @glamorous-carpet-83516
https://github.com/flyteorg/flyte/pull/5916#pullrequestreview-2402463075dry-egg-91175
11/22/2024, 8:35 PMPythonFunctionTask
) that maps to the custom agent and passes the parameters to the agent via get_custom
method.
We haven't yet deployed the Agent Service, but we're confused that won't the Flytepropeller create a container for the custom task inheriting from PythonFunctionTask
? Do we even need the custom task?damp-lion-88352
11/28/2024, 4:22 PMgray-ram-51379
12/03/2024, 1:02 AMyaou@yaou-mn1 ~ % k get pods
NAME READY STATUS RESTARTS AGE
ambassador-69d4998797-2rf9r 1/1 Running 0 4d23h
datacatalog-864796d889-ngqx2 1/1 Running 0 4d23h
flyte-pod-webhook-76d88c67-6x7rb 1/1 Running 0 4d23h
flyteadmin-5cf46b95c7-8dnnk 1/1 Running 0 4d23h
flyteagent-56bd59b5d4-9vjcg 1/1 Running 37 (33m ago) 6d3h
flyteconsole-69c4c8795d-hnhg8 1/1 Running 0 67d
colossal-nightfall-74781
03/04/2025, 5:32 PMget
method still runs every 10sec. Any advice would be appreciatedcolossal-nightfall-74781
03/18/2025, 2:17 PMalert-oil-1341
04/01/2025, 2:42 PMget
sometimes in rapid succession, causing issues w/ database lookups. We expected the get to be called at some interval, but that doesn't seem to be the case always. Is this interval configurable and if so, how?colossal-nightfall-74781
04/17/2025, 3:52 PMfreezing-airport-6809
freezing-airport-6809
hallowed-toothbrush-42565
04/29/2025, 8:58 PMtask_id=TASK:{project}:{domain}:{launch_plan}.{workflow}.{task}:{version}
) from the context, but calling current_context()
inside the connector returns task_id=TASK:local:local:local:local
.
If this is not supported yet, is there a recommended workaround?alert-oil-1341
05/27/2025, 9:33 PMcolossal-nightfall-74781
06/25/2025, 9:16 PMasync _get
method of AsyncConnecorExecutorMixin
and use the resource output as input for my deck? If I invoke Deck.publish() inside that method, will it show in the Flyte Console?alert-oil-1341
08/06/2025, 3:31 PMSyncAgentBase
agent w/ Flytekit 1.15.4 and ran about 1000 workflows. Of those, a handful gave us this error:
currentAttempt done. Last Error: USER::failed to create task from agent with rpc error: code = Internal desc = failed to create edge_artifact task with error:
Trace:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/flytekit/extend/backend/agent_service.py", line 102, in wrapper
res = await func(self, request, context, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/flytekit/extend/backend/agent_service.py", line 121, in CreateTask
agent.create,
^^^^^^^^^^^^
AttributeError: 'EdgeArtifactAgent' object has no attribute 'create'
Message:
AttributeError: 'EdgeArtifactAgent' object has no attribute 'create'.
What's weird is that this is in the AsyncAgentBase
code path. So I'm wondering if there's an issue w/ looking up the agent in the registry or something? Or why we get this strange behavior.