This message was deleted.
# opal
s
This message was deleted.
o
1. Keep in mind that you not need the all objects, maybe only id and relation, so it can help with keeping the data small, I think for max size you should aim on a few GB (this is also related to the way you keep your data e.g: hash map is better than lists ) maybe @Or Weis will have something to add here 2. It seems like you need to write a custom data fetcher, inside it you can control the logic and the parameters 3. can you explain this use case a bit more?
a
Thanks @Oded Bd! re p.2. What would be the common practice there? (I mean keeping the data up to date with incremental updates) I have checked this tutorial, but it actually seems to tell how to trigger a full data reload on a client. re 3 - for now its more for a development time, e.g. I messed up the data or changed the OPAL_DATA_CONFIG_SOURCES in external and I want all the clients to reload the data .
and one more on top of that - what does save_method mean in the context of data source config?
o
sorry was in meetings - reading nowđź‘€
As you can imagine, those datasets that needed to be cached can grow huge. How to optimize that? How much data is recommended to store in OPA/L’s cache?
This highly depends on your setup - but as a rule of thumb stick to ideally below 1 GB, 2 GB is bearable, >5 GB OPA will fail in most cases.
Once all the data is loaded, how do I make sure it stays updated? Is it by sending new portions of data via data update triggers (like described here)? Is there a way for OPAL to poll for new data updates for each endpoint with a parameter like “lastUpdatedAt” to only get the new data?
Yes, you keep things up to date by triggering updates. There’s no “lastUpdated” feature in OPAL currently - though it could make sense (You’re welcome to open a PR or issue for this). As Oded suggested you can implement something like this using a custom-data fetcher. Usually - since you can assume the agent got all the previous updates (any disconnect and the client will refetch everything from the baseline) - you can just add things sequentially without needing to know explicitly what came before. That is the common practice.
for now its more for a development time, e.g. I messed up the data or changed the OPAL_DATA_CONFIG_SOURCES in external and I want all the clients to reload the data .
By restarting the server you’d achieve just that. Alternatively you can send all of the clients (by topics) a data-update that overrides everything.
and one more on top of that - what does save_method mean in the context of data source config?
It’s the HTTP method that would be used on OPA when saving the data to it
🚀 1
a
Thank you for an extensive explanation!
đź’Ş 1
One more question I have, sorry for bombing 🙂 we have microservices deployed in several regions, and the tenancy support is a good fit, though I wonder, how to go with OPAL servers? Should we deploy a server+client per region, or one server and clients per region, all connecting to one "central" server?
o
This is a matter of architecture, so there can be a lot of good answers. The key in my opinion is one OPAL client per env as you said, to keep the latency low, after that I would go with one server unless I see the latency / load are a problem.
👍 1
a
Thanks! And with that kind of setup, what's the difference between OPAL_POLICY_SUBSCRIPTION_DIRS and Scopes? It kinda feels that they solve same purpose, just you could configure a bit more with scopes
o
Scopes isolate clients within a server, allowing for different repositories and branches. We use Scopes for multitenancy for clients on OPAL. Policy dirs allows different client to subscribe to different rego files / folders within a repo.
đź’Ş 1