This message was deleted.
# ask-for-help
s
This message was deleted.
n
hi, were you able to solve, we are facing similar issue with the integration
s
Hi @Navya Dalavayi No, it is still not solved for us, sadly 😞 In short, the tracing only shows up once in a while. I don't know why that is. @Chaoyu's initial thought was it has something to do with lazy imports. But, he didn't think so after trying out ddtrace himself because he saw logs coming through for him. I don't know if he had a chance to try out tracing in datadog because it may incur cost to try out datadog. At this point, I don't know what to look at. I think I need to read more about how distributed tracing works in datadog before engaging the BentoML community again. If we have more users are facing similar issue, then BentoML community may shift their priorities to look into this some more.
n
Hi @Shihgian Lee, thank you for the detailed info. I am having an issue when starting up the application itself. Would really appreciate help here. If you are available can we have a quick huddle?
I am using bentoml 1.0.10 and datadog 1.6.3
s
Hi @Navya Dalavayi I have a busy remaining week. I am unable to do a quick huddle. I don't know much about the Datadog setup and don't have a local datadog setup. Our DevOps set it up on our Kubernetes cluster. I just deployed BentoML with Datadog integration to our cluster.
n
May I ask which version of bentoml you are using and how did you integrate it with datadog
Is it using ddtrace library? I.E adding ddtrace-run in the entry point command
s
No, I didn't use ddtrace-run. @Chaoyu recommended a turnkey solution that we can initialize ddtrace within the code. First, I
pip install ddtrace
. Then I created a utility method:
Copy code
def initialize_tracer():
    env = os.getenv('ENV')
    dd_env = os.getenv('DD_ENV')
    version = os.getenv('DD_VERSION')
    pod_name = os.getenv('POD_NAME')
    if dd_env:
        from ddtrace import tracer, patch_all
        tracer.set_tags({'env': env,
                         'team_owner': 'datascience',
                         'version': version,
                         'pod_name': pod_name})
        patch_all()
Before all the BentoML imports in a service module, I
initialize_tracer
.
I am using
BentoML 1.10.0
.
n
Thanks I'll try this out
QQ have you tried ddtrace-run earlier? Or is it directly this approach
s
I never used ddtrace-run because it requires me to identify an entry point that BentoML community needs to provide. I have been using the turnkey solution since BentoML 0.x
n
Got it. Thanks. I have edited the dockerfile to change the entry point using the templating concept provided by bentoml since 1.0.x. But getting some issues on startup. I can share that snippet in case you want to try out
This was working for me in the prior versions that is before the major release
πŸ™ 1
@Shihgian Lee attaching the files for dd integration which worked for me with bentoml==1.0.8 bentofile.yaml
Copy code
service: "serving.py:svc" 

docker:
  env:
    BENTOML_CONFIG: "src/configuration.yaml"
  dockerfile_template: "Dockerfile.template"
Dockerfile.template
Copy code
{% extends bento_base_template %}
{% block SETUP_BENTO_ENTRYPOINT %}
{{ super() }}
ENTRYPOINT [ "{{ bento__entrypoint }}", "ddtrace-run", "bentoml", "serve", "{{ bento__path }}", "--production" ]
{% endblock %}
❀️ 1
hope this helps
s
Hi @Navya Dalavayi Thank you for sharing your code! I will definitely give it a try since the turnkey solution doesn't seem to work well for us. @Chaoyu I think it is worthwhile documenting the above in BentoML containerization documentation. Datadog monitoring is pretty common nowadays.
πŸ™Œ 1
βž• 1
@Chaoyu In the containerization documentation, we have the following:
Copy code
{% extends bento_base_template %}
{% block SETUP_BENTO_ENTRYPOINT %}
{{ super() }}

...
ENTRYPOINT [ "{{ bento__entrypoint }}", "python", "-m", "awslambdaric" ]
{% endblock %}
Can I define the entry point as follows instead?:
Copy code
{% extends bento_base_template %}
{% block SETUP_BENTO_ENTRYPOINT %}
{{ super() }}

...
ENTRYPOINT [ "{{ bento__entrypoint }}" ]

CMD ["ddtrace-run", "bentoml", "serve", "{{ bento__path }}", "--production" ]
{% endblock %}
Cc @Navya Dalavayi
Hi @Navya Dalavayi Thank you for your instructions! I managed to use
ddtrace-run
with
bentoml
. My
Dockerfile.template
template is a bit different but should not impact the end result :
Copy code
{% extends bento_base_template %}
{% block SETUP_BENTO_ENTRYPOINT %}
{{ super() }}
CMD ["ddtrace-run", "bentoml", "serve", "{{ bento__path }}", "--production" ]
{% endblock %}
The generated Dockerfile has the following content:
Copy code
...

USER bentoml

ENTRYPOINT [ "/home/bentoml/bento/env/docker/entrypoint.sh" ]



CMD ["ddtrace-run", "bentoml", "serve", "/home/bentoml/bento", "--production" ]
I prefer the
ddtrace-run
approach over
patch_all()
because I don't have to worry about initialize
patch-all()
as early as possible in the code. However, the
ddtrace-run
didn't solve my problem. In my runner, I use Feast to fetch features from Google Datastore. In
Bentoml 0.13.x
, I could see
grpc
trace. After I upgraded to
Bentoml 1.0.10
, the grpc trace showed up a couple times and then disappeared forever. Datadog has grpc integration. I suspect the data retrieval for Google Datastore maybe too fast for Datadog to capture the trace. I am curious if you access any datastore in your runner? If you do, do you see any traces for your datastore?
n
Hey @Shihgian Lee I am not using any data stores as of now. Is grpc integration of datadog not automatic?
s
Hi @Navya Dalavayi Thank you for your response. The grpc integration support is out of the box and automatic. It stopped working for BentoML 1.0.x. After spending quite a bit of time on reading and trying different things, I believe the traces are lost due to the new runner architecture; specifically micro-batching at the runner level. All my BentoML services are pretty simple and don't do a lot other than fetching data from feature store and making inferences. If you plan to make external calls to data stores via a feature store, this is something to be aware of. I haven't tried adding custom span yet. I suspect it won't work either.
n
Hey @Shihgian Lee, we are successfully functioning with datadog+newrelic as of, once our app evolves to have multiple external connections, will keep you posted about the logging changes.
πŸ™ 1
c
@Sean could you help look into the tracing issue Shihgian mentioned above?
j
I believe the traces are lost due to the new runner architecture; specifically micro-batching at the runner level.
Hi @Shihgian Lee . Tracing over the new arch\ is already included in the design. In our tests with the latest version, they were working
It seems not work in your case, can you provide further details, like what you observerd?
s
@Navya Dalavayi I think runners are not getting instrumented because they spawn in separate processes which
patch_all
was not able to instrument. While we are trying to understand more about
ddtrace
, have you looked into OTLP ingestion using the Datadog Agent? BentoML comes with OTLP tracing support, users will just have to configure the agent endpoint.
s
I think runners are not getting instrumented because they spawn in separate processes which
patch_all
was not able to instrument.
@Sean That is correct - runner is in a separate process and tracing might be lost. I have moved away from
patch_all
and replace it with ddtrace-run. This is a better solution since we don't have to worry about where we should add
patch_all
in our code. The ddtrace-run didn't solve our problem, which is expected because we didn't do anything differently to address separate process for the runner.
OTLP ingestion is not part of our infrastructure. We will have to look into it and discuss it with our DevOps. Thanks for the suggestion! However, this won't be a near term fix for us because there are factors that are not within our team's control.
We moved feast code out of runner and into service api. The Datadog trace shows up for grpc again. Our code is also much simpler. The lesson learned is to make runner dead simple (dumb); only making inferences and nothing else.
πŸ™Œ 3
c
thanks for sharing the result!
πŸ™Œ 1