Description
Delivery of server-sent events to clients will become unreliable if more than 1 instance of touca-app
is running as a cluster/single service. This may not be an important configuration to support, but I thought I'd flag it. Feel free to close if not relevant.
Environment
Any configuration that can deploy multiple instances of the server container, e.g. K8s, Docker Swarm.
Steps To Reproduce
Deploy as per Environment
above, with at least 2 instances of touca-app
running. Connect multiple clients to touca-app
. Complete an action from one of the clients that should trigger the delivery of an event to all connected clients. The event may or may not be broadcast to all appropriate clients.
Expected Behavior
All clients should receive all relevant events, regardless of which instance of touca-app
they are connected to.
Additional Context
Not a high-priority fix, but it will definitely come up if you ever need to horizontally scale Touca.
It happens because BullMQ
does not support a 'fanout' job distribution pattern, and it doesn't seem like they will any time soon. Since each instance of touca-app
keeps track of server-event subscriptions in its own process, and each enqueued job is consumed by exactly one worker and then discarded, only the clients who happen to be connected to the same process as the worker that consumes the job can receive the event.
Low Priority