Handling websocket connections connected to different replicas - kubernetes

Backend is running on kubernetes which has N replicas.
Frontend users(browsers) listen websocket for real-time messages.
Since Backend has N replicas, each browser connects to different one.
Let's say there are users A B C, all have opened 2 tabs, A1 A2 B1 B2 C1 C2.
Backend-replica-1 may hold websocket connections of A1 B1
Backend-replica-2 may hold websocket connections of A2 B2 C1 C2
(please correct me if I'm wrong till here. But, as far as I know and have tested, this is how it works)
To broadcast a message to ALL users, I publish a notification to RabbiqMQ with the message I wanted. I am using Fanout exchange, so that Each backend replica will consume that notification. Then each server sends the message to all users(connections) that it has.
One question, is if this is the correct way of doing it or is there any better way? Would it be more sensible to have a separate server for only handling WS connections?
Now, I need a solution to detect if a user closed all tabs(left the application), and update that user in db. My initial idea was to detect that on websocket disconnect event, check if user has any active connections left. But doing that seems very complex, because user may have opened 3 tabs, each connecting to different backend server.
Anyone has a clue on how to achieve it?

To keep track of who is connected and who closed all tabs, you need to have one DB-like instance. All you need to do is to publish an event to that instance when someone connects or disconnects.
A simple implementation would to have a dictionary with a user identifier as key and number of session as a value.
When a user connects to a websocket, you publish a connected event that increments the number of sessions.
When a user disconnects, you publish a disconnected event that decrements the number of sessions and return the remaining number of session or just a boolean if the number is equal to 0. Then you can do whatever you need to do to clean up his session.
Note: you need to make the write operation synchronous if you will have multiple threads receiving the events. A good solution is to use a SQL database which support ACID out of the box.
If you need to more control, you can extend the solution to have a publish subscribe architecture. Where the connection is bidirectional rather than the unidirectional method above. This way, if someone connects in another tab, you can interact with the first tab and show a message or disconnect based on your needs.

Related

How to handle WebSocket-dependent data on server crash?

I'm using a postgres database to maintain a list of rooms and connected users to each room.
Users can enter a room whenever they want, but they should leave the room when they close the browser.
A good flow of events should be
User enters room (user's room var is set) -> ... -> User disconnects
and server notices (user's room var is unset)
But what if this happens?
User enters room (user's room var is set) -> ... -> Server crashes or
shuts down for updates -> User disconnects and server doesn't notice
(user's room var is still set) -> Server is back on
In this last case, the database state is already broken. What's the best way to deal with something like this? Thanks
Let's divide the answer into 2 aspects:
User Aspect:
Regardless of the language at hand, you should be made aware of disconnection events using a Socket event/exception handling.
If the server crashes, your user will experience an abrupt socket disconnection/connection closing/session termination, depending on which framework your are using. TCP Sockets also have keepalive (SO_KEEPALIVE) exactly for that (you can usually control these (or similar) settings from the high-level protocol.
So, all you need to to do in that case is run maintenance code on the user's end (unset a variable in you describe case)
Server Aspect:
It's a bit trickier here. What you are basically looking for is ephemeral state management, meaning, the ability to react to abprut service/server termination (server crashes that result in an corrupted/unclean state), and clean-up after them
For that, Technologies like Zookeeper or Consul exist. I personally recommend Zookeeper, as I have built similar solutions on top of it in the past, several times.
With zookeeper, when your server startup, it can, for instance, creates an EPHEMERAL node. That node will be created once the server goes up, and will remain there for as long as the server is alive and connected to the Zookeeper cluster. If the server crashes unexpectedly, This node is removed.
You can then have a separate application/script that listens to events on that zk node/path. If it's suddenly remove, you can run a cleanup routine on the database.
This approach supports multiple app instances of course - you can listen on an events under a path and have all server instance register using different nodes under it. The removed node can contain instance specific identifiers, and you can use those to clean up specific instance state from the database.
It can also be a wise choice to remove clean-up/maintenance duty to a separate component
(Note that ZooKeeper requires careful attention when dealing connection/state events)
Some additional Zookeeper reading material
Final Thoughts:
Of course the answer can be fine-tuned based on specific needs that were not presented in the question.
When building complex, stateful solution, I personally aim to deal with crashes on all ends of the solutions, playing 'safe' where possible

Redirecting client from load balancer to right consumer in Kafka

I am working on a personal project in which I want to be able to send one message from a producer to an end-user.
Each message will have a key that identifies the user that has to receive the message.
This is the overall structure I have imagined:
I cannot figure out how I can tell the load balancer that whenever a user with key 2 for example contacts the load balancer, then we have to set up a connection (possibly with a WebSocket) with the consumer handling partitions with key 2 in them. Probably something can be done by using the same technique Kafka uses whenever it has to assign a partition to the key, or by keeping track of the keys each consumer manages.
I do not know whether this is possible, but even if it was, the technique I described would probably make the code too coupled with the architecture.
Could you please help me out with how I can achieve this? I do not want to store messages on a remote data store and retrieve them from a random consumer. I want the consumer to be able to serve the user as soon as possible whenever a connection is established with it. If there is no connection with that user, then I can store the message and deliver it when the connection is ready.
I eventually found useful the Push Messaging technique they use at Netflix. The trick is to add another level of indirection, made of web servers. Whenever a new client connects to one of the web servers, a tuple <client_id, webserver_id> is saved in an external data store. When the consumer needs to send the message to the client having that specific key, it looks it up on the external registry to find where the client is connected. Once it finds it, it sends the message to the right web server that will push the message to the client.

P2P web based automated response based on user query

I would like to create a web based p2p application between two nodes. A website shows a list of nodes. When a user (say A) clicks on a node, it must setup a p2p chat like connection between the two. It goes like this : once connection is established, node A can send a query message to B. Once B receives the query message, B must respond with the correct answer, (if A queries : RETRIEVE x.txt, B's response must be the contents of x.txt) I would like to be pointed into the right direction regarding the proper tech / protocols to be used. Thank you😀
Firstly, if you want to reach all the nodes, you need to collect their information so that when you click, you can connect to it.
Secondly, if you want to connect to these nodes, you need to do NAT traversal so that they can connect to each other.
Thirdly, you may want a reliable connection, so you need reliable UDP.
According these, you need the following protocols:
Creating a center controller like tracker to collect infomations
Creating NAT traversal, like nat-pmp and upnp, and it's better if you can build a ICE in the center controller
When click to some node, using udp to connect to it
If you want the connection reliable, you may also need to do reliable UDP, like QUIC, kcp or libutp.

websocket communication between clients in distributed system

I'm trying to build instant messaging app. Clients will not only send messages but also often send audios. And I've decided to use websocket connection to communicate with clients. It is fast and allows to send binary data.
The main idea is to receive from client1 message and notify about it client2. But here's the thing. My app will be running on GAE. And what if client1's socket is opened on server1 and client2's is opened on server2. This servers don't know about each others clients.
I have one idea how to solve it, but I am sure it is shitty way. I am going to use some sort of communication between servers(for example JMS or open another websocket connection between servers, doesn't matter right now).
But it surely will lead to a disaster. I can't even imagine how often those servers will speak to each other. For each message server1 should notify server2, server2 should notify client2. But things become even worse when serverN comes into play.
Another way I see this to work is Firebase. But it restricts message size to 4KB. So I can't send audios via it. As a solution I can notify client about new audio and he goes to my server for it.
Hope I clearly explained the problem. Does anyone know how to solve it? Or maybe there are another ways to build such apps?
If you are building a messaging cluster and expect communicating clients to connect to different instances of the server then server-server communication is inevitable. Usually it's not a problem though.
First, if you don't use any load balancing your clients will connect to the same server 50% of time on average (in case of 2 servers).
Second, intra-datacenter links are fast and free in all known public clouds.
Third, you can often do something smart on the frontend to make sure two likely to communicate clients connect to the same server. For instance direct all clients from the same country to the same server using DNS load balancing.
The second part of the question is about passing large media files. It's a common best practice to send it out of band - store on the server and only pass the reference to it. Like someone suggested in the comment, save the audio on the server and just send a message like "audio is available, fetch it from here ...". You don't need to poll the server for that. Just fetch it once when the receiving client requests it.
In general, it seems like you are trying to reinvent the wheel. Just use something off the shelf.
Let all client get connected to multiple servers and each server keeps this metadata
A centralized system like zookeeper stores active servers details
When a client c1 sends a message to client c2:
the message is received by a server (say s1, we can add a load balancer to distribute incoming requests)
s1 will broadcast this information to all other servers to get which server the client c2 is connected to OR a better approach to use consistent hashing which decides which server the client can connect to & in this approach message broadcast is not required
the corresponding server responses to server s1 (say s2)
now s1 sends the message m to s2 and server s2 to client c2
Cons of the above approach:
Each server will have a connection with the n-1 servers, creating a mesh topology
Centralized system (zookeeper) becomes a single point of failures (which is solvable)
Apps like Whatsapp, G-Talk uses XMPP and TCP/IP.

Asterisk HA and SIP registration

I setup an Active/Passive cluster with Pacemaker/Corosync/DRBD. I wanted to make an Asterisk server HA. The solution works perfectly but when the service fails on one server and starts on another all registered SIP clients with the active server will be lost. And the passive server show nothing in the output of:
sip show peers
Until clients make a call or register again. One solution is to set the Registration rate on clients to 1 Min or so. Are there other options? For example integrating Asterisk with a DBMS helps to save this kind of state in a DB??
First of all doing clusters by non-expert is bad idea.
You can use realtime sip architecture, it save state in database. Complexity - average. Note, "sip show peers" for realtime also show nothing.
You can use memory duplicating cluster(some solution for xen exists) which will copy memory state from one server to other. Complexity - very complex.