Nats.io: Will the subscribers with the different subjects but the same named queue process messages in parallel? - queue

Let`s imagine we have server A with publisher and servers B and C with consumers.
Also we have got 5 different subjects; foo1, foo2,... foo5.
We always want to send a message only to one consumer and receive only one response.
So we utilize the requestOne function from the JS SDK at the publisher side and subscribe function with the {queue: "default"} option.
So both servers B and C has been subscribed one time for each subject.
But every time they subscribe they use queue with name "default" to prevent multiple consumers receive the same message as mentioned in docs.
So the question is:
Will this queue with name "default" be shared across all the subjects? Or each subject will have his own queue with name "default" and it is just shared between the subscribers of particular subject.
For example: producer generates 10 messages 2 for each subject.
Will we have 10 messages processed at the same time or only 2 messages since all the subscription share the same queue with name "default"?

You form a queue group based on the queue name that you specify and the subject. So a queue group of "foo" is different than a queue group on "bar".
That being said, with wildcards, you could have multiple subjects being part of the same queue group. That is, 2 members of the group "bar" listening on "foo.*" would split processing of messages sent on "foo.bar", "foo.baz", etc..

The same queue name in different subjects is separate.
You can test it with the examples in the link below.
https://nats.io/documentation/additional_documentation/nats-queueing/
start nats server
gnatsd
sub subject1
go run nats-qsub.go subject1 default
...
sub subject2
go run nats-qsub.go subject2 default
...
pub subject1&2
go run nats-pub.go subject1 "message"
...
go run nats-pub.go subject2 "message"
...

Related

Architecture for ML jobs platform

I'm building a platform to run ML jobs.
Jobs will be started from an interface.
I'm making a service for each type of jobs. Some times, a service S1 might require to first make a request to another service S2 and get its output before running its own job.
Each service is split into 2 Kubernetes deployment:
one that will pull the message from a topic, check it and persist it to a database (D1)
one that will read request from the database, run the actual job, update the request state in the database and then answer to the client (D2)
Here is the flow:
interface generates a PubSub message to a topic T1
D1 pulls message from T1 and persist a request to a database
D2 sees the new request in the database and runs it then update its state in the database and answer to the client
To answer to the client, D2 has 2 options:
push a message to a pubsub topic T2 that will continiously be checked by the client. An id is passed in both request and response so that only the client can pull it from the topic.
use a callback provided by the client to make a POST request
What do you think abouut this architecture ? Does the usage of PubSub makes sense ? Also does it make sense to split each service into 2 deployment (1 that deals with request, 1 that runs the actual job ) ?
interface generates a PubSub message to a topic T1 D1 pulls message
from T1 and persist a request to a database
If there's only one database, I'm not sure I see much advantage in using a topic (implying pub/sub). Another approach would be to use a queue: the interface creates jobs into the queue, then you can have any number of workers processing it. Depending on the situation you may not even need the database at all - if all the data needed can be in the message in the queue.
use a callback provided by the client to make a POST request
That's better if you can do it, on the assumption that there's only one consumer for the event; pub/sub is more for broadcasting out to multiple consumers. Polling works but is really inefficient and has limits on how much it can scale.
Also does it make sense to split each service into 2 deployment (1
that deals with request, 1 that runs the actual job ) ?
Having separate deployables make sense if they are built by different teams and have a different release cadence or if you need to scale them out independently, otherwise it may not be necessary.

Phoenix Channels - Multiple channels per socket

I'm writing an application using Elixir Channels to handle realtime events. I understand that there will be 1 socket open per client and can multiplex multiple channels over it. So my app is a chat application where users are part of multiple group chats. I have 1 Phoenix Channel called MessageChannel where the join method will handle dynamic topics.
def join("groups:" <> group_id, payload, socket) do
....
Let's say John joins groups/topics A and B while Bob only join group/topic B. When john sends a message to group/topic A, broadcast!/3 will also send that message to Bob too correct? Because handle_in doesn't have a context of which topic/group the message was sent to.
How would I handle it so that Bob doesn't receive the events that was sent to group A. Am I designing this right?
Because handle_in doesn't have a context of which topic/group the message was sent to.
When Phoenix.Channel.broadcast/3 is called, apparently it does have the topic associated with the message (which is not obvious from the signature). You can see the code starting on this line of channel.ex:
def broadcast(socket, event, message) do
%{pubsub_server: pubsub_server, topic: topic} = assert_joined!(socket)
Server.broadcast pubsub_server, topic, event, message
end
So when the call to broadcast/3 is made using the socket, it pattern matches out the current topic, and then makes a call to the underlying Server.broadcast/4.
(If you're curious like I was, this in turn makes a call to the underlying PubSub.broadcast/3 which does some distribution magic to route the call to your configured pubsub implementation server, most likely using pg2 but I digress...)
So, I found this behavior not obvious from reading the Phoenix.Channel docs, but they do state it explicitly in the phoenixframework channels page in Incoming Events:
broadcast!/3 will notify all joined clients on this socket's topic and invoke their handle_out/3 callbacks.
So it's only being broadcasted "on this socket's topic". They define topic on that same page as:
topic - The string topic or topic:subtopic pair namespace, for example “messages”, “messages:123”
So in your example, the "topics" are actually the topic:subtopic pair namespace strings: "groups:A" and "groups:B". John would have to subscribe to both of these topics separately on the client, so you would actually have references to two different channels, even though they're using the same socket. So assuming you're using the javascript client, the channel creation looks something like this:
let channelA = this.socket.channel("groups:A", {});
let channelB = this.socket.channel("groups:B", {});
Then when you go to send a message on the channel from a client, you are using only the channel that has a topic that gets pattern matched out on the server as we saw above.
channelA.push(msgName, msgBody);
Actually, the socket routing is done based on how to define your topics in your projects Socket module with the channel API. For my Slack clone, I use three channels. I have a system level channel to handle presence update, a user channel, and a room channel.
Any given user is subscribed to 0 or 1 channels. However, users may be subscribed to a number of channels.
For messages going out to a specific room, I broadcast them over the room channel.
When I detect unread messages, notifications, or badges for a particular room, I use the user channel. Each user channel stores the list of rooms the user has subscribed too (they are listed on the client's side bar).
The trick to all this is using a couple channel APIs, mainly intercept, handle_out, My.Endpoint.subscribe, and handle_info(%Broadcast{},socket).
I use intercept to catch broadcasted messages that I want to either ignore, or manipulate before sending them out.
In the user channel, I subscribe to events broadcast from the room channel
When you subscribe, you get a handle_info call with the %Broadcast{} struct that includes the topic, event, and payload of the broadcasted message.
Here are couple pieces of my code:
defmodule UcxChat.UserSocket do
use Phoenix.Socket
alias UcxChat.{User, Repo, MessageService, SideNavService}
require UcxChat.ChatConstants, as: CC
## Channels
channel CC.chan_room <> "*", UcxChat.RoomChannel # "ucxchat:"
channel CC.chan_user <> "*", UcxChat.UserChannel # "user:"
channel CC.chan_system <> "*", UcxChat.SystemChannel # "system:"
# ...
end
# user_channel.ex
# ...
intercept ["room:join", "room:leave", "room:mention", "user:state", "direct:new"]
#...
def handle_out("room:join", msg, socket) do
%{room: room} = msg
UserSocket.push_message_box(socket, socket.assigns.channel_id, socket.assigns.user_id)
update_rooms_list(socket)
clear_unreads(room, socket)
{:noreply, subscribe([room], socket)}
end
def handle_out("room:leave" = ev, msg, socket) do
%{room: room} = msg
debug ev, msg, "assigns: #{inspect socket.assigns}"
socket.endpoint.unsubscribe(CC.chan_room <> room)
update_rooms_list(socket)
{:noreply, assign(socket, :subscribed, List.delete(socket.assigns[:subscribed], room))}
end
# ...
defp subscribe(channels, socket) do
# debug inspect(channels), ""
Enum.reduce channels, socket, fn channel, acc ->
subscribed = acc.assigns[:subscribed]
if channel in subscribed do
acc
else
socket.endpoint.subscribe(CC.chan_room <> channel)
assign(acc, :subscribed, [channel | subscribed])
end
end
end
# ...
end
I also use the user_channel for all events related to a specific user like client state, error messages, etc.
Disclaimer: I have not looked at the internal workings of a channel, this information is completely from my first experience of using channels in an application.
When someone joins a different group (based on the pattern matching in your join/3), a connection over a separate channel (socket) is made. Thus, broadcasting to A will not send messages to members of B, only A.
It seems to me the Channel module is similar to a GenServer and the join is somewhat like start_link, where a new server (process) is spun up (however, only if it does not already exist).
You can really ignore the inner workings of the module and just understand that if you join a channel with a different name than already existing ones, you are joining a unique channel. You can also just trust that if you broadcast to a channel, only members of that channel will get the message.
For instance, in my application, I have a user channel that I want only a single user to be connected to. The join looks like def join("agent:" <> _agent, payload, socket) where agent is just an email address. When I broadcast a message to this channel, only the single agent receives the message. I also have an office channel that all agents join and I broadcast to it when I want all agents to receive the message.
Hope this helps.

RabbitMQ - Message order of delivery

I need to choose a new Queue broker for my new project.
This time I need a scalable queue that supports pub/sub, and keeping message ordering is a must.
I read Alexis comment: He writes:
"Indeed, we think RabbitMQ provides stronger ordering than Kafka"
I read the message ordering section in rabbitmq docs:
"Messages can be returned to the queue using AMQP methods that feature
a requeue
parameter (basic.recover, basic.reject and basic.nack), or due to a channel
closing while holding unacknowledged messages...With release 2.7.0 and later
it is still possible for individual consumers to observe messages out of
order if the queue has multiple subscribers. This is due to the actions of
other subscribers who may requeue messages. From the perspective of the queue
the messages are always held in the publication order."
If I need to handle messages by their order, I can only use rabbitMQ with an exclusive queue to each consumer?
Is RabbitMQ still considered a good solution for ordered message queuing?
Well, let's take a closer look at the scenario you are describing above. I think it's important to paste the documentation immediately prior to the snippet in your question to provide context:
Section 4.7 of the AMQP 0-9-1 core specification explains the
conditions under which ordering is guaranteed: messages published in
one channel, passing through one exchange and one queue and one
outgoing channel will be received in the same order that they were
sent. RabbitMQ offers stronger guarantees since release 2.7.0.
Messages can be returned to the queue using AMQP methods that feature
a requeue parameter (basic.recover, basic.reject and basic.nack), or
due to a channel closing while holding unacknowledged messages. Any of
these scenarios caused messages to be requeued at the back of the
queue for RabbitMQ releases earlier than 2.7.0. From RabbitMQ release
2.7.0, messages are always held in the queue in publication order, even in the presence of requeueing or channel closure. (emphasis added)
So, it is clear that RabbitMQ, from 2.7.0 onward, is making a rather drastic improvement over the original AMQP specification with regard to message ordering.
With multiple (parallel) consumers, order of processing cannot be guaranteed.
The third paragraph (pasted in the question) goes on to give a disclaimer, which I will paraphrase: "if you have multiple processors in the queue, there is no longer a guarantee that messages will be processed in order." All they are saying here is that RabbitMQ cannot defy the laws of mathematics.
Consider a line of customers at a bank. This particular bank prides itself on helping customers in the order they came into the bank. Customers line up in a queue, and are served by the next of 3 available tellers.
This morning, it so happened that all three tellers became available at the same time, and the next 3 customers approached. Suddenly, the first of the three tellers became violently ill, and could not finish serving the first customer in the line. By the time this happened, teller 2 had finished with customer 2 and teller 3 had already begun to serve customer 3.
Now, one of two things can happen. (1) The first customer in line can go back to the head of the line or (2) the first customer can pre-empt the third customer, causing that teller to stop working on the third customer and start working on the first. This type of pre-emption logic is not supported by RabbitMQ, nor any other message broker that I'm aware of. In either case, the first customer actually does not end up getting helped first - the second customer does, being lucky enough to get a good, fast teller off the bat. The only way to guarantee customers are helped in order is to have one teller helping customers one at a time, which will cause major customer service issues for the bank.
It is not possible to ensure that messages get handled in order in every possible case, given that you have multiple consumers. It doesn't matter if you have multiple queues, multiple exclusive consumers, different brokers, etc. - there is no way to guarantee a priori that messages are answered in order with multiple consumers. But RabbitMQ will make a best-effort.
Message ordering is preserved in Kafka, but only within partitions rather than globally. If your data need both global ordering and partitions, this does make things difficult. However, if you just need to make sure that all of the same events for the same user, etc... end up in the same partition so that they are properly ordered, you may do so. The producer is in charge of the partition that they write to, so if you are able to logically partition your data this may be preferable.
I think there are two things in this question which are not similar, consumption order and processing order.
Message Queues can -to a degree- give you a guarantee that messages will get consumed in order, they can't, however, give you any guarantees on the order of their processing.
The main difference here is that there are some aspects of message processing which cannot be determined at consumption time, for example:
As mentioned a consumer can fail while processing, here the message's consumption order was correct, however, the consumer failed to process it correctly, which will make it go back to the queue. At this point the consumption order is intact, but the processing order is not.
If by "processing" we mean that the message is now discarded and finished processing completely, then consider the case when your processing time is not linear, in other words processing one message takes longer than the other. For example, if message 3 takes longer to process than usual, then messages 4 and 5 might get consumed and finish processing before message 3 does.
So even if you managed to get the message back to the front of the queue (which by the way violates the consumption order) you still cannot guarantee they will also be processed in order.
If you want to process the messages in order:
Have only 1 consumer instance at all times, or a main consumer and several stand-by consumers.
Or don't use a messaging queue and do the processing in a synchronous blocking method, which might sound bad but in many cases and business requirements it is completely valid and sometimes even mission critical.
There are proper ways to guarantuee the order of messages within RabbitMQ subscriptions.
If you use multiple consumers, they will process the message using a shared ExecutorService. See also ConnectionFactory.setSharedExecutor(...). You could set a Executors.newSingleThreadExecutor().
If you use one Consumer with a single queue, you can bind this queue using multiple bindingKeys (they may have wildcards). The messages will be placed into the queue in the same order that they were received by the message broker.
For example you have a single publisher that publishes messages where the order is important:
try (Connection connection2 = factory.newConnection();
Channel channel2 = connection.createChannel()) {
// publish messages alternating to two different topics
for (int i = 0; i < messageCount; i++) {
final String routingKey = i % 2 == 0 ? routingEven : routingOdd;
channel2.basicPublish(exchange, routingKey, null, ("Hello" + i).getBytes(UTF_8));
}
}
You now might want to receive messages from both topics in a queue in the same order that they were published:
// declare a queue for the consumer
final String queueName = channel.queueDeclare().getQueue();
// we bind to queue with the two different routingKeys
final String routingEven = "even";
final String routingOdd = "odd";
channel.queueBind(queueName, exchange, routingEven);
channel.queueBind(queueName, exchange, routingOdd);
channel.basicConsume(queueName, true, new DefaultConsumer(channel) { ... });
The Consumer will now receive the messages in the order that they were published, regardless of the fact that you used different topics.
There are some good 5-Minute Tutorials in the RabbitMQ documentation that might be helpful:
https://www.rabbitmq.com/tutorials/tutorial-five-java.html

Questions on Topic filters in azure service bus

Couldn't find any articles answering this specific question so here goes.
Say you have a topic called companyorders and you have 3 filters/subscriptions, companyA, companyB and allcompanies.
Messages sent to the topic for companyA get passed to sub companyA and allcompanies etc. Then messages start coming in for a companyC that hasn't got a specific sub setup so they are sent to only allcompanies sub.
When companyC starts up their client app and it creates the companyC sub(I don't see a way of setting up a sub with a specific filter in the portal) how or can I pull messages from the allcompanies sub for companyC that where previously missed because the sub was not setup beforehand?
Hope that makes sense.
Thanks
Paul
Seems that the subscription/filter needs to be setup before messages are sent to the topic. I tested this by creating a topic and a subscription. I then posted messages to the topic with property DriverID. I passed in a DriverID = 1. This message ended up in the subscription setup earlier as this subscription has the 'MatchAll' filter by default.
I then created another subscription with a filter of DriverID = 1. When I posted a message to the topic and set the property DriverId = 1 it got sent to the 2 subscriptions as expected. Messages posted before this subscription was setup that had DriverID = 1 were not automatically moved to the new subscription that matches the filter.
Paul

Can I filter the messages I receive from a message queue (MSMQ) by some property? (a.k.a. topic)

I am creating a Windows Service in C# that processes messages from a queue. I want to give ops the flexibility of partitioning the service in production according to properties of the message. For example, they should be able to say that one instance processes web orders from Customer A, another batch orders from Customer A, a third web or batch orders from Customer B, and so on.
My current solution is to assign separate queues to each customer\source combination. The process that puts orders into the queues has to make the right decision. My Windows Service can be configured to pull messages from one or more queues. It's messy, but it works.
No, but you can PEEK into the queue and decide if you really want to consume the message.
Use GetMessageEnumerator2() like this:
MessageEnumerator en = q.GetMessageEnumerator2();
while (en.MoveNext())
{
if (en.Current.Label == label)
{
string body = ((XmlDocument)en.Current.Body).OuterXml;
en.RemoveCurrent();
return body;
}
}