This message was deleted BentoML #ask-for-help

Join Slack

This message was deleted.

# ask-for-help

Slackbot

04/03/2023, 10:33 PM

This message was deleted.

Eric Riddoch

04/04/2023, 5:13 PM

I'd also like to point out that we don't even get a request ID when an error is thrown. This seems like a major shortcoming. Is there a way we can get the request trace ID both on errors and successful requests?

Tim Liu

04/07/2023, 3:55 PM

So the request is these 2 parts? 1. Some request id in the logging message, the "x-bentoml-request-id" would be fine 2. Same request id in the error message

Eric Riddoch

04/07/2023, 5:02 PM

Yay! Thanks for responding to this. Yes to those parts, and I’d add one more: Whether it’s the request ID or the trace ID, it should show up in the logs when OpenTelemetry “decorates” them. That way, if I have a request fail, I can grab the trace ID from the error response body and/or headers and then search for that in whichever observability platform I’m using, which would pull up the logs from *all the services involved in the trace. I think a trace ID would be more valuable than a request ID for that reason. That’s was AWS XRay does (I believe AWS XRay returns both) and I believe it’s what all of the observability vendors make your API return if you instrument them with their auto instrumentation tooling.

Tim Liu

04/07/2023, 5:14 PM

Yea, totally makes sense. Let me scope this out with the team and get you an eta

🎉 1

Sean

04/08/2023, 10:14 AM

Hi Eric, thanks for the suggestion. What you can do today is to propagate the trace context from the client to the BentoML service through the OpenTelemetry protocol. If a trace is not already started, it is recommended to start the trace context from the root of the call graph. If you start the trace context from the client, you will automatically know the trace id when an error (or success) is returned, because the service is guaranteed to have the same trace id respecting the Opentelemetry protocol.

Sean

04/08/2023, 11:38 AM

Depending on your choice of client implementation, OpenTelemetry typically have good support. For example, if you use an

aiohttp

client, you can start a trace like here, https://opentelemetry-python-kinvolk.readthedocs.io/en/latest/instrumentation/aiohttp_client/aiohttp_client.html.

Sean

04/08/2023, 11:43 AM

BentoClient uses

aiohttp

underneath. You can start a trace with the following example.

Copy code

import numpy as np

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from bentoml.client import Client
from opentelemetry.instrumentation.aiohttp_client import AioHttpClientInstrumentor

# Enable instrumentation
AioHttpClientInstrumentor().instrument()

# Set the tracer provider
trace.set_tracer_provider(TracerProvider())

tracer = trace.get_tracer(__name__)

# Create a new root span, set it as the current span in context
with tracer.start_as_current_span("parent") as span:
    client = Client.from_url("<http://localhost:3000>")
    res = client.classify(np.array([[5,4,3,2]]))
    print(res, span)

Within the span, you have access to both the trace id and span id.

Copy code

[1.] _Span(name="parent", context=SpanContext(trace_id=0xa0a6d729fa810529ca039f4f74daa245, span_id=0xd2b688f5ec252e34, trace_flags=0x01, trace_state=[], is_remote=False))

On the server side, the request has the same trace id in both

api_server

and

runner

Copy code

2023-04-08T04:39:39-0700 [INFO] [runner:iris_clf:1] _ (scheme=http,method=POST,path=/predict,type=application/octet-stream,length=32) (status=200,type=application/vnd.bentoml.NdarrayContainer,length=8) 0.827ms (trace=a0a6d729fa810529ca039f4f74daa245,span=f6636b5e84fae525,sampled=1)
2023-04-08T04:39:39-0700 [INFO] [api_server:iris_classifier:7] 127.0.0.1:50801 (scheme=http,method=POST,path=/classify,type=application/json,length=22) (status=200,type=application/json,length=5) 2.845ms (trace=a0a6d729fa810529ca039f4f74daa245,span=4a0f7029f04a41e3,sampled=1)

Over the wire, the trace data is sent as http headers over the OpenTelemetry protocol, see

traceparent

Copy code

Hypertext Transfer Protocol
    POST /classify HTTP/1.1\r\n
    Host: localhost:3000\r\n
    content-type: application/json\r\n
    Hypertext Transfer Protocol
    POST /classify HTTP/1.1\r\n
    Host: localhost:3000\r\n
    content-type: application/json\r\n
    traceparent: 00-90dfec16e4879db3ddae2241fa29fb16-da2a8c6f736a0282-01\r\n
    Accept: */*\r\n
    Accept-Encoding: gzip, deflate\r\n
    User-Agent: Python/3.10 aiohttp/3.8.3\r\n
    Content-Length: 22\r\n
        [Content length: 22]
    \r\n
    [Full request URI: <http://localhost:3000/classify>]
    [HTTP request 1/1]
    [Response in frame: 91]
    File Data: 22 bytes: 00-90dfec16e4879db3ddae2241fa29fb16-da2a8c6f736a0282-01\r\n
    Accept: */*\r\n
    Accept-Encoding: gzip, deflate\r\n
    User-Agent: Python/3.10 aiohttp/3.8.3\r\n
    Content-Length: 22\r\n
        [Content length: 22]
    \r\n
    [Full request URI: <http://localhost:3000/classify>]
    [HTTP request 1/1]
    [Response in frame: 91]
    File Data: 22 bytes

With this, you know the exact trace id for every request you decide to trace.

Sean

04/08/2023, 11:44 AM

What we should do better though, is to automatically start a trace (if not already started), when using BentoClient.

Eric Riddoch

04/08/2023, 11:39 PM

This is fascinating, somehow it didn't occur to me that clients can start the trace so that REST APIs use the same ID. But REST API's can be clients for other REST APIs so... that tracks 😄 A problem we'd have is that not all teams int he company use OpenTelemetry or NewRelic instrumentation (TBH, I'm not sure how those two interoperate... I should look into that) Anyway, for other teams in the company and for theoretical external users of our BentoML APIs, they're just looking for a "request ID" in the response headers or body when things go wrong, so they can send it to my team "the developers" to look into. So it'd be great if we didn't have to make the assumption that any API clients happen to be instrumented with OpenTelemetry. And naturally, since we're exposing it to other teams who don't use Python, I'm assuming the BentoClient isn't an option for them (I know that was just an example, but it's probably worth pointing out)

Sean

04/09/2023, 10:24 AM

A Request ID should already be returned in the response today. See below for an example HTTP 200 response with the

x-bentoml-request-id

header. Trace ID is current not automatically returned from the response today. Request ID is not logged in the access log.

Copy code

Hypertext Transfer Protocol
    HTTP/1.1 200 OK\r\n
    date: Sat, 08 Apr 2023 11:39:39 GMT\r\n
    server: uvicorn\r\n
    content-length: 5\r\n
        [Content length: 5]
    content-type: application/json\r\n
    x-bentoml-request-id: 9273968296152751308\r\n
    \r\n
    [HTTP response 1/1]
    [Time since request: 0.003232000 seconds]
    [Request in frame: 151]
    [Request URI: <http://localhost:3000/classify>]
    File Data: 5 bytes

However, returning a Request or Trace ID from the response is not the most reliable approach. It is also technically difficult to cover all paths, because the server will have to instrument all paths that could potentially return a response, even those prior to a trace context is established. If feasible, you can add a header to your API service code to include the Trace ID manually. Though

trace_context

is an internal API, it should be stable.

Copy code

import numpy as np

import bentoml
from <http://bentoml.io|bentoml.io> import NumpyNdarray
from bentoml._internal.context import trace_context

iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()

svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])


@svc.api(
    input=NumpyNdarray.from_sample(
        np.array([[4.9, 3.0, 1.4, 0.2]], dtype=np.double), enforce_shape=False
    ),
    output=NumpyNdarray.from_sample(np.array([0.0], dtype=np.double)),
)
async def classify(input_series: np.ndarray, ctx) -> np.ndarray:
    ctx.response.headers["x-bentoml-trace-id"] = str(trace_context.trace_id)
    return await iris_clf_runner.predict.async_run(input_series)

Trace instrumentation is best started from the caller side. OpenTelemetry is the most powerful when the entire call graph is propagating trace context. OpenTelemetry should support most popular frameworks, Python or not, out-of-the-box. https://opentelemetry.io/docs/instrumentation/

Eric Riddoch

04/09/2023, 6:54 PM

Oh hey! I didn't know about the

ctx

parameter. That's. great!

Trace instrumentation is best started from the caller side. OpenTelemetry is the most powerful when the entire call graph is propagating trace context. OpenTelemetry should support most popular frameworks, Python or not, out-of-the-box

I'm aware that OpenTelemetry has instrumentation for many languages/frameworks. And internally (within the DE/AI team), we can certainly instrument our code. Although we prefer the NewRelic instrumentation whenever it's supported because it actually injects more context than OTel does by default. That aside, if we're looking for a "request ID" (I really mean a "trace ID" when I say this, but customers don't necessarily know the difference) it must be generated and returned whether the customer (client) is instrumented or not.

A Request ID should already be returned in the response today. See below for an example HTTP 200 response with the
x-bentoml-request-id
header.

I'm also aware that BentoML returns a request ID (although I don't believe it does when a 500 error is returned which makes it even less useful than it already is) Rant about error handling in BentoML / Could BentoML support middlewares / custom error handling? BentoML is built off of the starlette framework, isn't it? Does starlette not support some sort of middleware mechanism where you can add headers to all requests whether they succeed or not? It's really the error case that we care about. I don't imagine we'd be auditing success cases nearly as much, altthough it's definitely good to have that capability. With FastAPI, we're able to use middlewares to achieve this, and even set up a "global error handler" to make sure error responses are much more informative than the vague

500 Internal Server Error... something went wrong

we regard returning errors that vague without context as a sin of REST API design, so it was frustrating to discover that BentoML doesn't give the user any control over that logic, effectively forcing you to always return useless error messages when runtime errors occur. Question about your code example I don't love that this approach requires data scientists to know about tracing when they write their services (we have then own their own services mostly). It is cool that you can get the trace ID like this though! In this example: would the trace ID make it into the response headers if the request failed due to a runtime error? Also, if an error happens during deserialization of the parameters, I'm assuming we wouldn't get the trace ID and correspondingly, couldn't look up the error logs to diagnose that's what happened.

Open in Slack

Previous Next