Slackbot
06/15/2023, 11:02 AMRaz Co
06/15/2023, 11:39 AMBen Wallis
06/15/2023, 12:02 PMRaz Co
06/15/2023, 12:29 PMOPA transaction failed
because of the connection failure to the API.
That means that basically OPAL Client couldn’t load the initial data to OPA.
What happens when you try to deploy the Client separately, while the API is ready to serve requests ?
Depending on the answer here, you might want to change the retry amount of the data updater, or add lifecycle hooks to the containers (need to research on how to implement it in this case).Ben Wallis
06/15/2023, 12:34 PMOPAL_DATA_CONFIG_SOURCES
. I can't ever guarantee which other pods will be available at the point that OPAL Client starts.
According to the docs here https://docs.opal.ac/tutorials/healthcheck_policy_and_update_callbacks#-opa-healthcheck-policy if I enable the health check policy it'll only return ready
when the initial data sources from OPAL_DATA_CONFIG_SOURCES
have been successfully synced. Doesn't this suggest that there should be infinite retries until this succeeds? Otherwise the pod would be stuck with a failed healthcheck indefinitely.Ben Wallis
06/15/2023, 12:52 PM/v1/data/system/opal/healthy
returns HTTP 200 with a content of {"result":false}
when it's not ready - this means this endpoint can't be used as a http readinessProbe target since it treats HTTP 200 as "ready"Ben Wallis
06/15/2023, 1:06 PMcurl
which enables me to configure a readinessProbe like so:
readinessProbe:
exec:
command:
- sh
- -c
- curl -s <http://localhost:8181/v1/data/system/opal/healthy> | grep 'true'
failureThreshold: 10
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
This doesn't solve the problem though - it just means I can now correctly identify the application pod as unsuitable for usage because the OPA data hasn't been populated.Ben Wallis
06/15/2023, 1:33 PMOPAL_FETCHING_CALLBACK_TIMEOUT
- it doesn't care if it succeeds after this time which is why the OPA transaction fails. However, I think I have a bigger problem.
As I mentioned earlier some of my services with OPAL Client sidecar containers are the source of part of their own authorization data - I have the fetcher config configured with k8s service addresses, which causes a problem for the OPAL Client sidecar that needs to fetch from the service within the same pod as it. When doing a new deployment the cluster service DNS is still pointing to the previous instance of the pod because the new one isn't ready yet. I think I might need to think about using topics to allow those services to use localhost
instead of a cluster dns address. Perhaps I'm doing something more fundamentally wrong with this design though 😅Raz Co
06/15/2023, 2:15 PMlocalhost
like you said.
There is no reason to set 2 (or more) containers within one Pod and communicate between them using the load-balancing component (service or deployment fqdn), that’s a bad practice when it comes to sidecar containers.Ben Wallis
06/15/2023, 2:16 PMinitialDelaySeconds
set to 30 meaning it was never available before OPAL_FETCHING_CALLBACK_TIMEOUT
expired.
As for why I'm using the service fqdn, the initial configuration as specified in OPAL_DATA_CONFIG_SOURCES
comes from OPAL Server right? There's only one instance of that configuration, and if I have Service A, Service B and Service C that all have OPAL Client sidecars that need data from Service A then I need a URL in that data config that can be accessed by all 3 services, not just Service A, so localhost wouldn't workBen Wallis
06/15/2023, 2:52 PMhostAliases
I can configure a pod to resolve its own service fqdn to 127.0.0.1:
hostAliases:
- ip: "127.0.0.1"
hostnames:
- "service-a.multi-tenant.svc.cluster.local"
This means that OPAL Server can give out the same service FQDN to service A, service B, and service C, but service A won't be dependent on its own service fqdn (which won't be available until that pod's OPAL Clients livenessProbe succeeds).
This kind of setup doesn't seem like it would be unique to what we're doing so I'd be interested to know if there's an obvious design issue here.
To give a bit more context Service A is responsible for serving user roles/permissions - these are used as policy data which is fed into OPA via OPAL. Service B has ABAC-style policy data that is also fed into OPA via OPAL. And then both of those services' public APIs have policy checks using OPA from their sidecar. I have a cluster-internal API on both of these services that serves data in the correct JSON format to store into OPA via OPAL.
Is this an unusual setup? It doesn't seem particularly unusual but anyone using this kind of setup would surely hit the same issue of a service needing a sidecar that needs data from itself?Ben Wallis
07/26/2023, 12:18 PMRaz Co
07/26/2023, 8:05 PMRaz Co
07/26/2023, 8:06 PMBen Wallis
07/26/2023, 8:07 PM