This message was deleted.
# opal
s
This message was deleted.
o
This sounds like a misconfiguration of the broadcaster channel - which is used to sync between the workers of the OPAL server. You need to either configure a working broadcaster (aka backbone pub/sub) or configure the server to only work with one worker
k
We're just using a normal redis instance for our broadcaster channel
<redis://url_here:6379>
, is there any other configuration we need to do? We've been combing through the docs but dont see any issues with our setup
o
Hard to tell without additional info. Can you share the server / client logs ? At the client are you seeing the event register at the logs ? Are all the client subscribing to the same topic or scope? I would also try to run the OPAL server with a single worker - and see that the problem persists- just to cross-out the broadcaster issue
k
I dont think I'm able to share logs - but the client does not receive the data updates and the clients are all subscribed to the same group of topics. The initial data load we do on startup works across all clients, it's just the continuous updates that seem to only go to 1 or a few clients. Using 1 worker only 1 client would receive updates, using 4 workers we saw 2 clients receiving updates
o
Sounds very odd. Are the updates extremely heavy ? These are data updates right? Are you seeing any issue with policy updates?
k
the updates are small and yup these are just data updates. we don't have many policy updates so haven't seen issues with that yet
o
Are you using the latest version of OPAL?
k
ah we are not - we'll give that a try
o
CC: @Asaf Cohen , @Ro'e Katz , @Shaul Kremer
s
(following up for Kevin): We updated to the latest version of OPAL and we’re still seeing the same issues
I’m going to try and to run OPAL server with a single worker to see if that helps.
o
Mmm... I'd really need more information here to help you debug this. Could you verify in the client logs that there's no "Updating policy data" . Want to make sure the issue is the events not arriving, and not failure to process them. Also of course check for any errors or exceptions on both client and server logs. Try changing the data update, switching up topics. Could this be an underlying network issue? Can you replicate this on a local/single machine setup? I'll try to think of more avenues to explore. But more leads would be helpful
s
It did work with broadcast url off. Once we added a redis OPAL_BROADCAST_URI, it would only send data to 1 of the opal-clients
o
So sounds like there's an issue with either the specific Redis setup, or the broadcaster warpper for it. Could you try maybe Postgres Listen/ Notify (that's what we usually use at Permit) or Kafka?
k
We were getting no updating policy data logs yesterday when we were viewing all the OPAL clients running on our hosts
j
isn’t the broadcast url thing only for server sync? OR is that for clients? I thought clients did a socket pub/sub to the opal servers/
o
isn’t the broadcast url thing only for server sync? OR is that for clients? I thought clients did a socket pub/sub to the opal servers
It’s for the servers to sync. But if client-1 subscribe to server-1 (or worker 1), and you publish to server-2, unless they are connected via the broadcaster the message won’t reach server-1, and so won’t reach client-1
j
Yeah 🙂 I’d expect that … sounded in the above like one single server 😉 and lots of clients so … kinda curious to see what comes of this - I turned on the PS syncer 🙂
r
@Steven Daniels @Kevin Lin Like @Or Weis, my first guess would be that there’s something wrong with your Redis instance or its reachability. When trying running server with 1 worker, have you decreased both pod replicas and
UVICORN_NUM_WORKERS
to 1? (When doing so, it’s not even necessary to turn broadcaster off). If you’re willing to share your server and client configurations (or just parts of it) - that might help us figure out what might be the issue.
s
some follow up: There was a problem with our redis configuration. It’s been pretty difficult to understand where the problem is by following the logs. One thing I noticed is that a working connection has more logs than one that is not working properly. specifically, it included more logs for: 1.
opal_common.topics.publisher            |DEBUG  | stopping topic publisher
2. connection made/lost logs from
asyncio_redis.protocol
Copy code
2023-09-05T14:10:38.759930+0000 | opal_server.data.data_update_publisher  | INFO  | [23] Publishing data update to topics: {'policy_data'}, reason: because, entries: [{'url': '', 'method': 'PATCH', 'path': '/food/for', 'inline_data': True, 'topics': ['policy_data']}]
2023-09-05T14:10:38.760169+0000 | opal_common.topics.publisher            |DEBUG  | started topic publisher
2023-09-05T14:10:38.760300+0000 | opal_common.topics.publisher            |DEBUG  | stopping topic publisher
2023-09-05T14:10:38.760524+0000 | fastapi_websocket_pubsub.pub_sub_server |DEBUG  | Publishing message to topics: ['policy_data']
2023-09-05T14:10:38.760662+0000 | fastapi_websocket_pubsub.pub_sub_server |DEBUG  | Acquiring broadcaster sharing context
2023-09-05T14:10:38.760808+0000 | fastapi_websocket_pubsub.event_broadc...|DEBUG  | Did not subscribe to ALL_TOPICS: share count == 7
2023-09-05T14:10:38.761013+0000 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=policy_data, subscription_id=47584a88bc844224b34a4a212f570288, subscriber_id=3569da5a07454b749e91571dc8423470
2023-09-05T14:10:38.761689+0000 | fastapi_websocket_pubsub.rpc_event_me...| INFO  | Notifying other side: subscription={'id': '47584a88bc844224b34a4a212f570288', 'subscriber_id': '3569da5a07454b749e91571dc8423470', 'topic': 'policy_data', 'notifier_id': None}, data=id='2' entries=[DataSourceEntry(url='', data={'thought': 'uses policy_data', 'key': 'value', 'services': ['1', '2']}, config=None, topics=['policy_data'], dst_path='/food/for', save_method='PATCH')] reason='because' callback=UpdateCallback(callbacks=[]), channel_id=3569da5a07454b749e91571dc8423470
2023-09-05T14:10:38.762027+0000 | fastapi_websocket_rpc.rpc_channel       |DEBUG  | Calling RPC method - {'message': RpcMessage(request=RpcRequest(method='notify', arguments={'subscription': Subscription(id='47584a88bc844224b34a4a212f570288', subscriber_id='3569da5a07454b749e91571dc8423470', topic='policy_data', notifier_id=None), 'data': DataUpdate(id='2', entries=[DataSourceEntry(url='', data={'thought': 'uses policy_data', 'key': 'value', 'services': ['1', '2']}, config=None, topics=['policy_data'], dst_path='/food/for', save_method='PATCH')], reason='because', callback=UpdateCallback(callbacks=[]))}, call_id='034ff542b9f94705bd3da54fea773d50'), response=None)}
2023-09-05T14:10:38.762701+0000 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=policy_data (ALL_TOPICS), subscription_id=f0112c744fa74bcdbdc9b2980af5d8c2, subscriber_id=849b01350ed044079d827607ac4fa02f
2023-09-05T14:10:38.762865+0000 | fastapi_websocket_pubsub.event_broadc...| INFO  | Broadcasting incoming event: {'topic': 'policy_data', 'notifier_id': '849b01350ed044079d827607ac4fa02f'}
2023-09-05T14:10:38.763098+0000 | asyncio_redis.connection                | INFO  | Connecting to redis
2023-09-05T14:10:38.764908+0000 | asyncio_redis.protocol                  | INFO  | Redis connection made
2023-09-05T14:10:38.765122+0000 | asyncio_redis.connection                | INFO  | Connecting to redis
2023-09-05T14:10:38.766236+0000 | asyncio_redis.protocol                  | INFO  | Redis connection made
2023-09-05T14:10:38.772320+0000 | asyncio_redis.protocol                  | INFO  | Redis connection lost
2023-09-05T14:10:38.772750+0000 | asyncio_redis.protocol                  | INFO  | Redis connection lost
2023-09-05T14:10:38.772808+0000 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['policy_data'], 'src': '849b01350ed044079d827607ac4fa02f'}
2023-09-05T14:10:38.772924+0000 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['policy_data'], 'src': '849b01350ed044079d827607ac4fa02f'}
It would be really helpful if there were more errors when the broadcast isn’t working properly.
o
We are actually looking to get rid of the broadcaster all together.
👍 1
It's @Asaf Cohen's archenemies at this point
🤣 1