Difference between message queues and mailboxes - operating-system

In operating system what is the difference between message queues and mailboxes.

I suspect there is no universally accepted definition for what makes a message queue versus a mailbox. Each RTOS may use different terminology and implementation details so you'd have to look at each RTOS individually.
Generally speaking some of the common differences include:
Is the size of the messages sent through the queue/mailbox fixed or can the message size vary?
Does the queue/mailbox hold a reference to the message or a copy of the message?
Can the queue/mailbox hold one message, multiple messages, or unlimited messages?

A queue in general has very precise meaning in computing as a container data structure with first-in-first-out (FIFO) access semantics. In an RTOS queue specifically, access to the queue will be thread-safe and have blocking semantics.
A mailbox on the other hand has no generally accepted specific semantics, and I have seen the term used to refer to very different RTOS IPC mechanisms. In some cases there are in fact queues, but if the RTOS also supports an IPC queue, a mailbox will have somehow different semantics - often with respect to memory management. In other cases a mailbox may essentially be a queue of length 1 - i.e. it has the blocking and IPC capability of a queue, but with no buffering. Such a mechanism allows synchronous communication between processes.

Mailboxes are implemented using Queue and Semaphore.
If multiple threads are blocked to push data on to the full Queue using the mailbox put() method, upon availability of space only one thread can see the space available and allowed to push data onto the Queue with atomic cycle. Without atomic guarantee, another thread can push data to the Queue in the time another thread checked the size and push the data.
Similarly if more then 1 thread is waiting to get the data to empty Queue , it can also be implemented in atomic way.
But mailboxes have extra overhead as compare to Queue.

Related

Systemverilog Mailbox and Queue

I am not able to understand, Why we prefer Mailbox over Queue for inter process communication (eg: communication between Driver and Scoreboard)?
A mailbox is a built-in class around a queue that uses semaphores to control access to the ends of a queue. A mailbox only has FIFO element ordering whereas you can access the head, tail, or middle elements of a queue.
You typically use a mailbox when there are multiple threads reading and writing data and you need the atomic test-and-set operation of semaphore to know when the mailbox is full or empty. If you have only one process reading and writing to a queue, there is no need to use a mailbox. However if there are more than one thread, a mailbox is a convenient class to use.
In the UVM, we use a TLM FIFO which is another wrapper around a mailbox. TLM connections provide an isolating interface so you don't have to know what is on the other side of your port. See https://verificationacademy.com/sessions/how-tlm-works

How can (messaging) queue be scalable?

I frequently see queues in software architecture, especially those called "scalable" with prominent representative of Actor from Akka.io multi-actor platform. However, how can queue be scalable, if we have to synchronize placing messages in queue (and therefore operate in single thread vs multi thread) and again synchronize taking out messages from queue (to assure, that message it taken exactly once)? It get's even more complicated, when those messages can change state of (actor) system - in this case even after taking out message from queue, it cannot be load balanced, but still processed in single thread.
Is it correct, that putting messages in queue must be synchronized?
Is it correct, that putting messages out of queue must be synchronized?
If 1 or 2 is correct, then how is queue scalable? Doesn't synchronization to single thread immediately create bottleneck?
How can (actor) system be scalable, if it is statefull?
Does statefull actor/bean mean, that I have to process messages in single thread and in order?
Does statefullness mean, that I have to have single copy of bean/actor per entire system?
If 6 is false, then how do I share this state between instances?
When I am trying to connect my new P2P node to netowrk, I believe I have to have some "server" that will tell me, who are other peers, is that correct? When I am trying to download torrent, I have to connect to tracker - if there is "server" then we do we call it P2P? If this tracker will go down, then I cannot connect to peers, is that correct?
Is synchronization and statefullness destroying scalability?
Is it correct, that putting messages in queue must be synchronized?
Is it correct, that putting messages out of queue must be synchronized?
No.
Assuming we're talking about the synchronized java keyword then that is a reenetrant mutual exclusion lock on the object. Even multiple threads accessing that lock can be fast as long as contention is low. And each object has its own lock so there are many locks, each which only needs to be taken for a short time, i.e. it is fine-grained locking.
But even if it did, queues need not be implemented via mutual exclusion locks. Lock-free and even wait-free queue data structures exist. Which means the mere presence of locks does not automatically imply single-threaded execution.
The rest of your questions should be asked separately because they are not about message queuing.
Of course you are correct in that a single queue is not scalable. The point of the Actor Model is that you can have millions of Actors and therefore distribute the load over millions of queues—if you have so many cores in your cluster. Always remember what Carl Hewitt said:
One Actor is no actor. Actors come in systems.
Each single actor is a fully sequential and single-threaded unit of computation. The whole model is constructed such that it is perfectly suited to describe distribution, though; this means that you create as many actors as you need.

How are distributed queues architectured?

What are architectural patterns/solutions that make distributed queues tick?
Please share for both ordered and non-ordered types.
You can think of the backend of a queue as a replicated database. (I am assuming the queues you are talking about consider themselves as durable: when they accept a message, they guarantee at least once delivery.)
As a replicated database, the message queue backend uses a replication protocol to make sure the message is on at least N hosts before acknowledging receipt to the sender. Common replication protocols are 2PC, 3PC, and consensus protocols like Raft, Multi-Paxos, and Chain Replication.
To send a message to a receiver, you have to do almost the same replication with a message lease. The queue server reserves the message for a certain period of time; it sends the message to the receiver, and if/when the receiver ackowledges receipt of the message the server deletes the message. Otherwise, the servers will resend the message to the next available receiver.
Some message queues stop there, others add lots of bells and whistles. SQS is one queue implementation that doesn't add many bells and whistles so that it can scale more. It allows them, for example, to shard the queue so that one SQS queue is actually made of many—even thousands—of these queues as described above. As an aside, I once heard one SQS developer ask another "What does 'ordering' mean when you are accepting millions of messages per second?"
That being said, some queues do provide strong ordering guarantees. (I have implemented a couple of these types of systems.) The cost of this is less ability to scale. To maintain ordering the queue's complexity goes way up. The queue has to maintain an ordered log of all the messages, and have the same ordering replicated across its servers. This is much much harder than unordered replication. Ordered queue systems typically elect a master to maintain the ordering and all messages are routed to the master. They also tend to use the more complex protocols for replication.

What methods are available on unix for pub sub IPC?

There are various options for IPC.
Over a network:
for client-server, can use TCP
for pub sub, can use UDP multicast
Locally:
for client-server, can use unix domain sockets
for pub sub, can use ???
I suppose what I'd be interested in is some kind of file descriptor that supports many readers (subscribers) and many writers (publishers) simultaneously. Is this usage pattern feasible/efficient on unix?
After much googling I haven't found a whole lot in the way of ipc multicast, so I have decided to write a program pubsub that takes as arguments a publisher address and a subscriber address, listens and accepts connections on these 2 addresses, and then for each payload received on a publisher connection write it to each of the subscriber connections. It wouldn't surprise me if this is inefficient or reinventing the wheel but I have not come across a better solution.
I was looking for solutions to a similar problem and found /dev/fanout. Fanout is a kernel module that replicates its input out to all processes reading from it. You can think of it as IPC Broadcast mechanism. Works well for small data payloads according to the author. Multiple processes can write to the device and multiple processes can read from it. I am not sure of atomicity of writes though. Small writes from multiple processes should occur atomically as with FIFOs, etc.
More about Fanout:
http://compgroups.net/comp.linux.development.system/-dev-fanout-a-one-to-many-multi/2869739
http://www.linuxtoys.org/fanout/fanout.html
There are Posix message queues too. As man mq_overview puts it:
POSIX message queues allow processes to exchange data in the form of messages. This API is distinct from that provided by
System V message queues (msgget(2), msgsnd(2), msgrcv(2), etc.), but provides similar functionality.
Message queues are created and opened using mq_open(3); this function returns a message queue descriptor (mqd_t), which is
used to refer to the open message queue in later calls. Each message queue is identified by a name of the form /somename;
that is, a null-terminated string of up to NAME_MAX (i.e., 255) characters consisting of an initial slash, followed by one
or more characters, none of which are slashes. Two processes can operate on the same queue by passing the same name to
mq_open(3).
Messages are transferred to and from a queue using mq_send(3) and mq_receive(3). When a process has finished using the queue, it closes it using mq_close(3), and when the queue is no longer required, it can be deleted using mq_unlink(3).
Queue attributes can be retrieved and (in some cases) modified using mq_getattr(3) and mq_setattr(3). A process can request asynchronous notification of the arrival of a message on a previously empty queue using mq_notify(3).
A message queue descriptor is a reference to an open message queue description (cf. open(2)). After a fork(2), a child inherits copies of its parent's message queue descriptors, and these descriptors refer to the same open message queue descriptions as the corresponding descriptors in the parent. Corresponding descriptors in the two processes share the flags (mq_flags) that are associated with the open message queue description.
Each message has an associated priority, and messages are always delivered to the receiving process highest priority first.
Message priorities range from 0 (low) to sysconf(_SC_MQ_PRIO_MAX) - 1 (high). On Linux, sysconf(_SC_MQ_PRIO_MAX) returns 32768, but POSIX.1 requires only that an implementation support at least priorities in the range 0 to 31; some implementations provide only this range.
A more friendly introduction by Michael Kerrisk is available here: http://man7.org/conf/lca2013/IPC_Overview-LCA-2013-printable.pdf

MSMQ as a job queue

I am trying to implement job queue with MSMQ to save up some time on me implementing it in SQL. After reading around I realized MSMQ might not offer what I am after. Could you please advice me if my plan is realistic using MSMQ or recommend an alternative ?
I have number of processes picking up jobs from a queue (I might need to scale out in the future), once job is picked up processing follows, during this time job is locked to other processes by status, if needed job is chucked back (status changes again) to the queue for further processing, but physically the job still sits in the queue until completed.
MSMQ doesn't let me to keep the message in the queue while working on it, eg I can peek or read. Read takes message out of queue and peek doesn't allow changing the message (status).
Thank you
Using MSMQ as a datastore is probably bad as it's not designed for storage at all. Unless the queues are transactional the messages may not even get written to disk.
Certainly updating queue items in-situ is not supported for the reasons you state.
If you don't want a full blown relational DB you could use an in-memory cache of some kind, like memcached, or a cheap object db like raven.
Take a look at RabbitMQ, or many of the other messages queues. Most offer this functionality out of the box.
For example. RabbitMQ calls what you are describing, Work Queues. Multiple consumers can pull from the same queue and not pull the same item. Furthermore, if you use acknowledgements and the processing fails, the item is not removed from the queue.
.net examples:
https://www.rabbitmq.com/tutorials/tutorial-two-dotnet.html
EDIT: After using MSMQ myself, it would probably work very well for what you are doing, as far as I can tell. The key is to use transactions and multiple queues. For example, each status should have it's own queue. It's fairly safe to "move" messages from one queue to another since it occurs within a transaction. This moving of messages is essentially your change of status.
We also use the Message Extension byte array for storing message metadata, like status. This way we don't have to alter the actual message when moving it to another queue.
MSMQ and queues in general, require a different set of patterns than what most programmers are use to. Keep that in mind.
Perhaps, if you can give more information on why you need to peek for messages that are currently in process, there would be a way to handle that scenario with MSMQ. You could always add a database for additional tracking.