How does the messenger maintain the sequencing of the messages during chat and when users log in again? - distributed-computing

I was asked this question in an interview and was unable to answer it.
How does FB messenger order the messages on user side when two messages are concurrent in order to avoid view difference in display order during the chat period and when user visits the messenger again. I thought that we can store a timestamp with each message, which is the time the message is received by the server. However, this will not ensure the correct ordering of messages for clients.
Take a scenario where the server timestamp cannot determine the exact order of messages would look like this:
User-1 sends a message M1 to the server for User-2.
The server receives M1 at T1.
Meanwhile, User-2 sends a message M2 to the server for User-1.
The server receives the message M2 at T2, such that T2 > T1.
The server sends message M1 to User-2 and M2 to User-1.
So User-1 will see M1 first and then M2, whereas User-2 will see M2 first and then M1.
I read that resolve this issue, we can use Vector clocks but was unable to understand how the message sequencing be preserved for different users during the chat and when users log in again.
In the above scenario, user1 will see M1 followed by M2 whereas user2 will see M2 followed by M1. Now if each user also generates a sequence number or timestamp for each of its message to each of the client (separately). Then in scenario above user1 will send message M1 with sequence <1 (user1 seq), 0(user2 seq) > and user2 will send message M2 with sequence <0 (user1 seq), 1(user2 seq) >. So when both the message arrives at user1 and user2 they will have:
M1 <1, 0>
M2 <0, 1>
Now let’s say user1 sends more messages M3 <2, 1> and M4 <3, 1> then each of client will have following msgs.
M1 <1, 0>
M2 <0, 1>
M3 <2, 1>
M4 <3, 1>
So in this case when the user is logged in the display order for user-1 and user-2 during chat will be M1,M2,M3,M4 and M2,M1,M3,M4 respectively. Now, I want to know how will the same order be preserved for user-1 and user-2 end when the login again ?
Thanks.

The problem here is how we will generate a consistent chat conversation for each user from these sequence numbers.
Let's assume a conversation between Alice and Bob.
Message sequence structure:
message<Alice seq number, Bob sequence number>
One thing to note is numbers in M1, M2, M3,... are just used to differentiate the messages and don't have any relation with the actual message sequence.
View from Alice side:
1) Alice sends M1<1,0>
2) Bob sends M2<1,1>
3) Alice sends M3<2,1>
Now, Bob sends one message(M5) but before Alice gets that, Alice sends one more message.
4) Alice sends M4<3,1>
And now, she received a message from Bob.
5) Bob sends M5<2,2>
Since Bob didn't get M4 before sending M5 the Alice sequence number in M5 is 2.
If he would have got that, the M5 would look like M5<3,2>.
Now, View from Bob side:
1) Alice sends M1<1,0>
2) Bob sends M2<1,1>
3) Alice sends M3<2,1>
Now, Bob sends message M5 before getting M4 from Alice
4) Bob sends M5<2,2>
5) Alice sends M4<3,1>
Now when Alice logins next time server will fetch the data and sort it:
1) First sort with Bob sequence number.
2) if two or more messages have the same Bob's sequence number then sort it in Alice's sequence number within them.
Similarly for Bob
1. First sort the message-ids with respect to Alice sequence number.
2. if two or more messages have the same Alice's sequence number then sort it in Bob's sequence number within them.
So for Alice, it would be in the order of Bob's sequence number:
M1<1,0>
M2<1,1>
M3<2,1>
M4<3,1>
M5<2,2>
For Bob, it would be in the order of Alice's sequence number:
M1<1,0>
M2<1,1>
M3<2,1>
M5<2,2>
M4<3,1>
How we will store the message sequences in the database:
How a client will know which is his/her sequence number?
In our example we decide that the first number will be Alice sequence number and the second will be Bob's. But in real-time how this decision will be made. This can easily be solved if we make a convention that the first sequence number will always be the sender's sequence number and the second one is receivers. So when someone receives a message then he knows that the first sequence number is the sender's sequence number. and when he prepares the next message he increments his sequence number from the last received message and puts it in the first place and takes the sender's sequence number from the received message and put it in second place.
How server will know that which sequence number has to be stored where?
Now since we defined the above convention if the server gets a message from Alice the first field will be Alice sequence number and the second will be Bob's sequence number so it will store in that way. Similarly, it does it for Bob also.
Note: I was also looking for the solution for the above problem but didn't get anything on the net that can help so made my own solution. Please correct me if it breaks any use case so that we can improve it or try something else.

Related

Nats.io: Will the subscribers with the different subjects but the same named queue process messages in parallel?

Let`s imagine we have server A with publisher and servers B and C with consumers.
Also we have got 5 different subjects; foo1, foo2,... foo5.
We always want to send a message only to one consumer and receive only one response.
So we utilize the requestOne function from the JS SDK at the publisher side and subscribe function with the {queue: "default"} option.
So both servers B and C has been subscribed one time for each subject.
But every time they subscribe they use queue with name "default" to prevent multiple consumers receive the same message as mentioned in docs.
So the question is:
Will this queue with name "default" be shared across all the subjects? Or each subject will have his own queue with name "default" and it is just shared between the subscribers of particular subject.
For example: producer generates 10 messages 2 for each subject.
Will we have 10 messages processed at the same time or only 2 messages since all the subscription share the same queue with name "default"?
You form a queue group based on the queue name that you specify and the subject. So a queue group of "foo" is different than a queue group on "bar".
That being said, with wildcards, you could have multiple subjects being part of the same queue group. That is, 2 members of the group "bar" listening on "foo.*" would split processing of messages sent on "foo.bar", "foo.baz", etc..
The same queue name in different subjects is separate.
You can test it with the examples in the link below.
https://nats.io/documentation/additional_documentation/nats-queueing/
start nats server
gnatsd
sub subject1
go run nats-qsub.go subject1 default
...
sub subject2
go run nats-qsub.go subject2 default
...
pub subject1&2
go run nats-pub.go subject1 "message"
...
go run nats-pub.go subject2 "message"
...

Why Jain SipDialog didn't increase localSequenceNumber after it sent reply for Subscribe request?

I'm using SipUnit to test my sip application which just forward the request. In my simple test case, user1(simulated with SipPhone) send Subscribe request and my application forward the request to user2(simulated
with SipPhone) and then user2 send reply by using JAIN ServerTransaction.sendResponse() method.
Then user2 send Notify to user1 using JAIN SipDialog.sendRequest().
And from the wireshark there is a problem in this Notify request: the CSeq is "1 Notify", but it should be "2 Notify" as it is in the same dialog as the Subscribe so the sequence number should be increased by 1.
Any idea?
When one party (say A) sends SUBSCRIBE to another (say B), then the NOTIFY will typically come from the B side and each direction, both (A to B) and (B to A) count the requests separately. So SUBSCRIBE will be CSeq 1 from A to B, and the NOTIFY will be CSeq 1 from B to A.

Phoenix Channels - Multiple channels per socket

I'm writing an application using Elixir Channels to handle realtime events. I understand that there will be 1 socket open per client and can multiplex multiple channels over it. So my app is a chat application where users are part of multiple group chats. I have 1 Phoenix Channel called MessageChannel where the join method will handle dynamic topics.
def join("groups:" <> group_id, payload, socket) do
....
Let's say John joins groups/topics A and B while Bob only join group/topic B. When john sends a message to group/topic A, broadcast!/3 will also send that message to Bob too correct? Because handle_in doesn't have a context of which topic/group the message was sent to.
How would I handle it so that Bob doesn't receive the events that was sent to group A. Am I designing this right?
Because handle_in doesn't have a context of which topic/group the message was sent to.
When Phoenix.Channel.broadcast/3 is called, apparently it does have the topic associated with the message (which is not obvious from the signature). You can see the code starting on this line of channel.ex:
def broadcast(socket, event, message) do
%{pubsub_server: pubsub_server, topic: topic} = assert_joined!(socket)
Server.broadcast pubsub_server, topic, event, message
end
So when the call to broadcast/3 is made using the socket, it pattern matches out the current topic, and then makes a call to the underlying Server.broadcast/4.
(If you're curious like I was, this in turn makes a call to the underlying PubSub.broadcast/3 which does some distribution magic to route the call to your configured pubsub implementation server, most likely using pg2 but I digress...)
So, I found this behavior not obvious from reading the Phoenix.Channel docs, but they do state it explicitly in the phoenixframework channels page in Incoming Events:
broadcast!/3 will notify all joined clients on this socket's topic and invoke their handle_out/3 callbacks.
So it's only being broadcasted "on this socket's topic". They define topic on that same page as:
topic - The string topic or topic:subtopic pair namespace, for example “messages”, “messages:123”
So in your example, the "topics" are actually the topic:subtopic pair namespace strings: "groups:A" and "groups:B". John would have to subscribe to both of these topics separately on the client, so you would actually have references to two different channels, even though they're using the same socket. So assuming you're using the javascript client, the channel creation looks something like this:
let channelA = this.socket.channel("groups:A", {});
let channelB = this.socket.channel("groups:B", {});
Then when you go to send a message on the channel from a client, you are using only the channel that has a topic that gets pattern matched out on the server as we saw above.
channelA.push(msgName, msgBody);
Actually, the socket routing is done based on how to define your topics in your projects Socket module with the channel API. For my Slack clone, I use three channels. I have a system level channel to handle presence update, a user channel, and a room channel.
Any given user is subscribed to 0 or 1 channels. However, users may be subscribed to a number of channels.
For messages going out to a specific room, I broadcast them over the room channel.
When I detect unread messages, notifications, or badges for a particular room, I use the user channel. Each user channel stores the list of rooms the user has subscribed too (they are listed on the client's side bar).
The trick to all this is using a couple channel APIs, mainly intercept, handle_out, My.Endpoint.subscribe, and handle_info(%Broadcast{},socket).
I use intercept to catch broadcasted messages that I want to either ignore, or manipulate before sending them out.
In the user channel, I subscribe to events broadcast from the room channel
When you subscribe, you get a handle_info call with the %Broadcast{} struct that includes the topic, event, and payload of the broadcasted message.
Here are couple pieces of my code:
defmodule UcxChat.UserSocket do
use Phoenix.Socket
alias UcxChat.{User, Repo, MessageService, SideNavService}
require UcxChat.ChatConstants, as: CC
## Channels
channel CC.chan_room <> "*", UcxChat.RoomChannel # "ucxchat:"
channel CC.chan_user <> "*", UcxChat.UserChannel # "user:"
channel CC.chan_system <> "*", UcxChat.SystemChannel # "system:"
# ...
end
# user_channel.ex
# ...
intercept ["room:join", "room:leave", "room:mention", "user:state", "direct:new"]
#...
def handle_out("room:join", msg, socket) do
%{room: room} = msg
UserSocket.push_message_box(socket, socket.assigns.channel_id, socket.assigns.user_id)
update_rooms_list(socket)
clear_unreads(room, socket)
{:noreply, subscribe([room], socket)}
end
def handle_out("room:leave" = ev, msg, socket) do
%{room: room} = msg
debug ev, msg, "assigns: #{inspect socket.assigns}"
socket.endpoint.unsubscribe(CC.chan_room <> room)
update_rooms_list(socket)
{:noreply, assign(socket, :subscribed, List.delete(socket.assigns[:subscribed], room))}
end
# ...
defp subscribe(channels, socket) do
# debug inspect(channels), ""
Enum.reduce channels, socket, fn channel, acc ->
subscribed = acc.assigns[:subscribed]
if channel in subscribed do
acc
else
socket.endpoint.subscribe(CC.chan_room <> channel)
assign(acc, :subscribed, [channel | subscribed])
end
end
end
# ...
end
I also use the user_channel for all events related to a specific user like client state, error messages, etc.
Disclaimer: I have not looked at the internal workings of a channel, this information is completely from my first experience of using channels in an application.
When someone joins a different group (based on the pattern matching in your join/3), a connection over a separate channel (socket) is made. Thus, broadcasting to A will not send messages to members of B, only A.
It seems to me the Channel module is similar to a GenServer and the join is somewhat like start_link, where a new server (process) is spun up (however, only if it does not already exist).
You can really ignore the inner workings of the module and just understand that if you join a channel with a different name than already existing ones, you are joining a unique channel. You can also just trust that if you broadcast to a channel, only members of that channel will get the message.
For instance, in my application, I have a user channel that I want only a single user to be connected to. The join looks like def join("agent:" <> _agent, payload, socket) where agent is just an email address. When I broadcast a message to this channel, only the single agent receives the message. I also have an office channel that all agents join and I broadcast to it when I want all agents to receive the message.
Hope this helps.

unsubscribe selected subscribers from node in xmpp

in xmpp publish subscribe protocol there is a provision to subscribe and unsubscribe to a node. but what if a publisher itself want to temporarily unsubscribe some of the subscribers and keep on publishing to selected subscribers only.
for example
A , B and C has subscribed to node PIZZA now if after somepoint if PIZZA node wants to publish only to A and C but not B.
i read the protocol but i didn't find anything like this , so is there anyone has any idea how to do it ?
i am using openfire as server and asmack libs as client
I don't know much about xmpp, maybe this is standard practice there, but normally the publisher doesn't know anything about the receivers so shouldn't control who is subscribed. Why does the publisher know better than the receiver whether the receiver should receive?
I would try a different approach, such as adding data in the message so the receiver can decide whether they should ignore the message.
Sending a blank message would not likely work: then all receivers that handle message only if not blank will skip it. So it will only work if B does not filter on blank messages. Instead, if the message has "filter=...", then receivers can decide to process based on the value of filter. Like, perhaps receivers A and C are a Type "X" of receiver, and receivers B and D are a type "Y" of receiver. Then if filter = "X", then receivers B and D know to ignore it. If filter is "Y", A and C know to ignore it. If filter is empty, they all process it.

Infinity confirmation loop

I came to interesting theoretical problem:
Let's assume we have Program A and Program B connected via some IPC like tcp socket or named pipe. Program A sends some data to Program B and depending on success of data delivery both A and B do some operations. However B should do its operation only if it is sure that A has got the delivery confirmation. So we came up to 3 connections:
A -> B [data tranfer]
B -> A [delivery confirmation]
A -> B [confirmation of getting delivery confirmation]
It may look weird but the goal is to don't do any operation neither on A nor B until both sides know that data has been transfered.
And here is the problem because second connection is for confirmation of success of first. And third is for confirmation of second but in fact there is no guarantee that connection 2 and 3 not fail and in that case we fall into infinite loop of confirmation. Is there some CS theory which solve that problem?
If I read your question right, the problem is called "the two general's problem". The gist of the issue is that the last entity that sends either a message or an acknowledgement knows nothing about the status of what it just sent, and so on.