This message was deleted Permit #opal

Join Slack

This message was deleted.

# opal

Slackbot

04/21/2023, 7:39 AM

This message was deleted.

Or Weis

04/21/2023, 8:05 AM

Hi @Robert Philp Can you share a little more on the performance issues you are experiencing? You are right in general about the formula for workers to cores . CLIENT_API_SERVER_WORKER_COUNT only affects running as a CLI. In which case, yes it sets the unicorn workers. I'd recommend the same cores logic for the fetching_worker_count. I wouldn't reduce the number of workers for the sake of OPA is it's concurrency isn't tied to any of the other workers (aside from the fetchers, but that's for only a small portion of the writing operations into OPA by the workers)

Robert Philp

04/21/2023, 2:06 PM

@Or Weis - we are seeing higher than expect durations between the logs in the opal-server and the opal-client ~200ms (typically closer to 15ms for other containers in the same network making rest calls). I set the uvicorn workers in the opal client but this resulted in an error message about the broadcast URI not being set. Is the intention to only ever run the client on a single worker. I didn't think this needed to run with a backbone? I can't see the any difference in the logs when I change the fetching_worker_count. Is this limited by the number of workers? How does this working?

Robert Philp

04/21/2023, 2:10 PM

Though we are getting a similar effect in the custom fetcher where logs either side of the http request are showing a gap of ~750ms but the trace within the service being called is only showing a duration of the requests of ~100 - 150ms. Just digging into this to see I can find any more details of why.

Or Weis

04/21/2023, 2:27 PM

Yes, I believe that the client has only one worker at a time, aside for the fetchers. The fetcher worker count will only become apperant if you launch multiple concurrent data updates. Interesting, Any other differences between the containers ?

Or Weis

04/21/2023, 2:28 PM

Volume of data maybe?

Or Weis

04/21/2023, 2:28 PM

Could also be indexing time into OPA itself

Or Weis

04/21/2023, 2:30 PM

@Shaul Kremer suggests it can also be policy indexing time

Robert Philp

04/25/2023, 8:33 AM

Thanks for the suggestions @Or Weis. I've managed to replicate this in a test environment with one opal-client and one opal-server container running. The reason turned out to be a function running synchronously in one of the workers 🙄 However, the behaviour wasn't what I was expecting. If I send 4 kafka messages (aimed at the same custom fetcher) that are then being consumed by opal-server, Opal client is picking each of these up after the synchronous function in the previous fetcher has completed. Hence an increasing delay in Opal Client acknowledging each message in turn. I wasn't expecting the Opal Client to be blocked by a synchronous function in a fetcher and I wasn't expecting one fetcher to block all fetchers when the

FETCHING_WORKER_COUNT

is set to 9. I was assuming this behaviour would only happen once all 9 workers were blocked. Suspect I'm misunderstanding the intention here? This would suggest that

FETCHING_WORKER_COUNT

either isn't being set correctly or isn't doing what I though it was doing. Should this be setting the number of workers for the fetchers?

Or Weis

04/25/2023, 8:51 AM

Hi @Robert Philp - I wouldn’t expect this behavior either. Which fetcher are you using? I’m guessing not one we wrote. Can you share the code for the fetcher? Perhaps in addition to having synchronous code - the fetcher’s code is also locking on a shared object or resource of some sort ? forcing all workers to run in async fashion. Of course there could be a bug in our code. But it is a rather classic part of the project that has been changed in over a year. https://github.com/permitio/opal/blob/47e87e547dae8371686fb909278d0e0a4cfb76ab/packages/opal-common/opal_common/fetcher/engine/fetch_worker.py https://github.com/permitio/opal/blob/7fbc2f4896bce0e10700e61ad749e41930edafa5/packages/opal-common/opal_common/fetcher/engine/fetching_engine.py

Or Weis

04/25/2023, 8:52 AM

@Ro'e Katz could use another eye here 😉

Robert Philp

04/25/2023, 10:44 AM

Yes it's a custom fetcher. The synchronous code was part of an authentication call in the fetchers

__aenter__

function. I'm wording if this explains it? And moving it into the

_fetch_

function would resolve it? We'd really want to be caching the token here and only refreshing when needed. Is there any best practice advice on this in terms structuring this in the fetchers?

Or Weis

04/25/2023, 10:56 AM

That actually sounds okay - though how you implemented it can still hide a lock on a shared resource that is causing your problem. To try an zone in on the problem I can suggest you maybe replace the authentication flow you have there, with a fixed cookie / secret - just to test if that part of the code is the culprit.

👍 1

Charlotte Brady

04/25/2023, 4:18 PM

Hey - related question on this… https://docs.opal.ac/getting-started/configuration in your docs you have

FETCHING_WORKER_COUNT

- I’m wonder if in practice this needs to be prefixed with

OPAL

(i.e.

OPAL_FETCHING_WORKER_COUNT

) ?

Or Weis

04/25/2023, 4:37 PM

Yes. All env-vars need to be prefixed with OPAL_ Without exception.

👍 1

Or Weis

04/25/2023, 4:38 PM

If you missed that, yeah that definitely can be an issue 😅

Robert Philp

04/26/2023, 9:02 AM

I think we're getting to the bottom of this now. Thanks for you help! One more question on this. Given the client only uses one worker. Does this mean that it only ever uses one processor thread? Or is something else going on with the workers that means they can run across multiple threads. I'm just thinking about the cpu configuration for the client containers.

Or Weis

04/26/2023, 9:44 AM

Yes. For the incoming messages (but only for it) OPAL client is single threaded. But it's super lightweight, messages shouldn't contain data. And everything else is multithreaded

👍 1

Robert Philp

04/26/2023, 11:36 AM

Thanks

Open in Slack

Previous Next