How are distributed queues architectured?

How are distributed queues architectured? - queue

What are architectural patterns/solutions that make distributed queues tick?
Please share for both ordered and non-ordered types.

You can think of the backend of a queue as a replicated database. (I am assuming the queues you are talking about consider themselves as durable: when they accept a message, they guarantee at least once delivery.)
As a replicated database, the message queue backend uses a replication protocol to make sure the message is on at least N hosts before acknowledging receipt to the sender. Common replication protocols are 2PC, 3PC, and consensus protocols like Raft, Multi-Paxos, and Chain Replication.
To send a message to a receiver, you have to do almost the same replication with a message lease. The queue server reserves the message for a certain period of time; it sends the message to the receiver, and if/when the receiver ackowledges receipt of the message the server deletes the message. Otherwise, the servers will resend the message to the next available receiver.
Some message queues stop there, others add lots of bells and whistles. SQS is one queue implementation that doesn't add many bells and whistles so that it can scale more. It allows them, for example, to shard the queue so that one SQS queue is actually made of many—even thousands—of these queues as described above. As an aside, I once heard one SQS developer ask another "What does 'ordering' mean when you are accepting millions of messages per second?"
That being said, some queues do provide strong ordering guarantees. (I have implemented a couple of these types of systems.) The cost of this is less ability to scale. To maintain ordering the queue's complexity goes way up. The queue has to maintain an ordered log of all the messages, and have the same ordering replicated across its servers. This is much much harder than unordered replication. Ordered queue systems typically elect a master to maintain the ordering and all messages are routed to the master. They also tend to use the more complex protocols for replication.

Related

Implementing sagas with Kafka

I am using Kafka for Event Sourcing and I am interested in implementing sagas using Kafka.
Any best practices on how to do this? The Commander pattern mentioned here seems close to the architecture I am trying to build but sagas are not mentioned anywhere in the presentation.

This talk from this year's DDD eXchange is the best resource I came across wrt Process Manager/Saga pattern in event-driven/CQRS systems:
https://skillsmatter.com/skillscasts/9853-long-running-processes-in-ddd
(requires registering for a free account to view)
The demo shown there lives on github: https://github.com/flowing/flowing-retail
I've given it a spin and I quite like it. I do recommend watching the video first to set the stage.
Although the approach shown is message-bus agnostic, the demo uses Kafka for the Process Manager to send commands to and listen to events from other bounded contexts. It does not use Kafka Streams but I don't see why it couldn't be plugged into a Kafka Streams topology and become part of the broader architecture like the one depicted in the Commander presentation you referenced.
I hope to investigate this further for our own needs, so please feel free to start a thread on the Kafka users mailing list, that's a good place to collaborate on such patterns.
Hope that helps :-)

I would like to add something here about sagas and Kafka.
In general
In general Kafka is a tad different than a normal queue. It's especially good in scaling. And this actually can cause some complications.
One of the means to accomplish scaling, Kafka uses partitioning of the data stream. Data is placed in partitions, which can be consumed at its own rate, independent of the other partitions of the same topic. Here is some info on it: how-choose-number-topics-partitions-kafka-cluster. I'll come back on why this is important.
The most common ways to ensure the order within Kafka are:
Use 1 partition for the topic
Use a partition message key to "assign" the message to a topic
In both scenarios your chronologically dependent messages need to stream through the same topic.
Also, as #pranjal thakur points out, make sure the delivery method is set to "exactly once", which has a performance impact but ensures you will not receive the messages multiple times.
The caveat
Now, here's the caveat: When changing the amount of partitions the message distribution over the partitions (when using a key) will be changed as well.
In normal conditions this can be handled easily. But if you have a high traffic situation, the migration toward a different number of partitions can result in a moment in time in which a saga-"flow" is handled over multiple partitions and the order is not guaranteed at that point.
It's up to you whether this will be an issue in your scenario.
Here are some questions you can ask to determine if this applies to your system:
What will happen if you somehow need to migrate/copy data to a new system, using Kafka?(high traffic scenario)
Can you send your data to 1 topic?
What will happen after a temporary outage of your saga service? (low availability scenario/high traffic scenario)
What will happen when you need to replay a bunch of messages?(high traffic scenario)
What will happen if we need to increase the partitions?(high traffic scenario/outage & recovery scenario)
The alternative
If you're thinking of setting up a saga, based on steps, like a state machine, I would challenge you to rethink your design a bit.
I'll give an example:
Lets consider a booking-a-hotel-room process:
Simplified, it might consist of the following steps:
Handle room reserved (incoming event)
Handle room payed (incoming event)
Send acknowledgement of the booking (after payed and some processing)
Now, if your saga is not able to handle the payment if the reservation hasn't come in yet, then you are relying on the order of events.
In this case you should ask yourself: when will this break?
If you conclude you want to avoid the chronological dependency; consider a system without a saga, or a saga which does not depend on the order of events - i.e.: accepting all messages, even when it's not their turn yet in the process.
Some examples:
aggregators
Modeled as business process: parallel gateways (parallel process flows)
Do note in such a setup it is even more crucial that every action has got an implemented compensating action (rollback action).
I know this is often hard to accomplish; but, if you start small, you might start to like it :-)

How can (messaging) queue be scalable?

I frequently see queues in software architecture, especially those called "scalable" with prominent representative of Actor from Akka.io multi-actor platform. However, how can queue be scalable, if we have to synchronize placing messages in queue (and therefore operate in single thread vs multi thread) and again synchronize taking out messages from queue (to assure, that message it taken exactly once)? It get's even more complicated, when those messages can change state of (actor) system - in this case even after taking out message from queue, it cannot be load balanced, but still processed in single thread.
Is it correct, that putting messages in queue must be synchronized?
Is it correct, that putting messages out of queue must be synchronized?
If 1 or 2 is correct, then how is queue scalable? Doesn't synchronization to single thread immediately create bottleneck?
How can (actor) system be scalable, if it is statefull?
Does statefull actor/bean mean, that I have to process messages in single thread and in order?
Does statefullness mean, that I have to have single copy of bean/actor per entire system?
If 6 is false, then how do I share this state between instances?
When I am trying to connect my new P2P node to netowrk, I believe I have to have some "server" that will tell me, who are other peers, is that correct? When I am trying to download torrent, I have to connect to tracker - if there is "server" then we do we call it P2P? If this tracker will go down, then I cannot connect to peers, is that correct?
Is synchronization and statefullness destroying scalability?

Is it correct, that putting messages in queue must be synchronized?
Is it correct, that putting messages out of queue must be synchronized?
No.
Assuming we're talking about the synchronized java keyword then that is a reenetrant mutual exclusion lock on the object. Even multiple threads accessing that lock can be fast as long as contention is low. And each object has its own lock so there are many locks, each which only needs to be taken for a short time, i.e. it is fine-grained locking.
But even if it did, queues need not be implemented via mutual exclusion locks. Lock-free and even wait-free queue data structures exist. Which means the mere presence of locks does not automatically imply single-threaded execution.
The rest of your questions should be asked separately because they are not about message queuing.

Of course you are correct in that a single queue is not scalable. The point of the Actor Model is that you can have millions of Actors and therefore distribute the load over millions of queues—if you have so many cores in your cluster. Always remember what Carl Hewitt said:
One Actor is no actor. Actors come in systems.
Each single actor is a fully sequential and single-threaded unit of computation. The whole model is constructed such that it is perfectly suited to describe distribution, though; this means that you create as many actors as you need.

Which Solution Handles Publisher/Subscriber Scenario Better?

The scenario is publisher/subscriber, and I am looking for a solution which can give the feasibility of sending one message generated by ONE producer across MULTIPLE consumers in real-time. the light weight this scenario can be handled by one solution, the better!
In case of AMQP servers I've only checked out Rabbitmq and using rabbitmq server for pub/sub pattern each consumer should declare an anonymous, private queue and bind it to an fanout exchange, so in case of thousand users consuming one message in real-time there will be thousands or so anonymous queue handling by rabbitmq.
But I really do not like the approach by the rabbitmq, It would be ideal if rabbitmq could handle this pub/sub scenario with one queue, one message , many consumers listening on one queue!
what I want to ask is which AMQP server or other type of solutions (anyone similar including XMPP servers or Apache Kafka or ...) handles the pub/sub pattern/scenario better and much more efficient than RabbitMQ with consuming (of course) less server resource?
preferences in order of interest:
in case of AMQP enabled server handling the pub/sub scenario with only ONE or LESS number of queues (as explained)
handling thousands of consumers in a light-weight manner, consuming less server resource comparing to other solutions in pub/sub pattern
clustering, tolerating failing of nodes
Many Language Bindings ( Python and Java at least)
easy to use and administer
I know my question may be VERY general but I like to hear the ideas and suggestions for the pub/sub case.
thanks.

In general, for RabbitMQ, if you put the user in the routing key, you should be able to use a single exchange and then a small number of queues (even a single one if you wanted, but you could divide them up by server or similar if that makes sense given your setup).
If you don't need guaranteed order (as one would for, say, guaranteeing that FK constraints wouldn't get hit for a sequence of changes to various SQL database tables), then there's no reason you can't have a bunch of consumers drawing from a single queue.
If you want a broadcast-message type of scenario, then that could perhaps be handled a bit differently. Instead of the single user in the routing key, which you could use for non-broadcast-type messages, have a special user type, say, __broadcast__, that no user could actually have, and have the users to broadcast to stored in the payload of the message along with the message itself.
Your message processing code could then take care of depositing that message in the database (or whatever the end destination is) across all of those users.
Edit in response to comment from OP:
So the routing key might look something like this message.[user] where [user] could be the actual user if it were a point-to-point message, and a special __broadcast__ user (or similar user name that an actual user would not be allowed to register) which would indicate a broadcast style message.
You could then place the users to which the message should be delivered in the payload of the message, and then that message content (which would also be in the payload) could be delivered to each user. The mechanism for doing that would depend on what your end destination is. i.e. do the messages end up getting stored in Postgres, or Mongo DB or similar?

RabbitMQ - Message order of delivery

I need to choose a new Queue broker for my new project.
This time I need a scalable queue that supports pub/sub, and keeping message ordering is a must.
I read Alexis comment: He writes:
"Indeed, we think RabbitMQ provides stronger ordering than Kafka"
I read the message ordering section in rabbitmq docs:
"Messages can be returned to the queue using AMQP methods that feature
a requeue
parameter (basic.recover, basic.reject and basic.nack), or due to a channel
closing while holding unacknowledged messages...With release 2.7.0 and later
it is still possible for individual consumers to observe messages out of
order if the queue has multiple subscribers. This is due to the actions of
other subscribers who may requeue messages. From the perspective of the queue
the messages are always held in the publication order."
If I need to handle messages by their order, I can only use rabbitMQ with an exclusive queue to each consumer?
Is RabbitMQ still considered a good solution for ordered message queuing?

Well, let's take a closer look at the scenario you are describing above. I think it's important to paste the documentation immediately prior to the snippet in your question to provide context:
Section 4.7 of the AMQP 0-9-1 core specification explains the
conditions under which ordering is guaranteed: messages published in
one channel, passing through one exchange and one queue and one
outgoing channel will be received in the same order that they were
sent. RabbitMQ offers stronger guarantees since release 2.7.0.
Messages can be returned to the queue using AMQP methods that feature
a requeue parameter (basic.recover, basic.reject and basic.nack), or
due to a channel closing while holding unacknowledged messages. Any of
these scenarios caused messages to be requeued at the back of the
queue for RabbitMQ releases earlier than 2.7.0. From RabbitMQ release
2.7.0, messages are always held in the queue in publication order, even in the presence of requeueing or channel closure. (emphasis added)
So, it is clear that RabbitMQ, from 2.7.0 onward, is making a rather drastic improvement over the original AMQP specification with regard to message ordering.
With multiple (parallel) consumers, order of processing cannot be guaranteed.
The third paragraph (pasted in the question) goes on to give a disclaimer, which I will paraphrase: "if you have multiple processors in the queue, there is no longer a guarantee that messages will be processed in order." All they are saying here is that RabbitMQ cannot defy the laws of mathematics.
Consider a line of customers at a bank. This particular bank prides itself on helping customers in the order they came into the bank. Customers line up in a queue, and are served by the next of 3 available tellers.
This morning, it so happened that all three tellers became available at the same time, and the next 3 customers approached. Suddenly, the first of the three tellers became violently ill, and could not finish serving the first customer in the line. By the time this happened, teller 2 had finished with customer 2 and teller 3 had already begun to serve customer 3.
Now, one of two things can happen. (1) The first customer in line can go back to the head of the line or (2) the first customer can pre-empt the third customer, causing that teller to stop working on the third customer and start working on the first. This type of pre-emption logic is not supported by RabbitMQ, nor any other message broker that I'm aware of. In either case, the first customer actually does not end up getting helped first - the second customer does, being lucky enough to get a good, fast teller off the bat. The only way to guarantee customers are helped in order is to have one teller helping customers one at a time, which will cause major customer service issues for the bank.
It is not possible to ensure that messages get handled in order in every possible case, given that you have multiple consumers. It doesn't matter if you have multiple queues, multiple exclusive consumers, different brokers, etc. - there is no way to guarantee a priori that messages are answered in order with multiple consumers. But RabbitMQ will make a best-effort.

Message ordering is preserved in Kafka, but only within partitions rather than globally. If your data need both global ordering and partitions, this does make things difficult. However, if you just need to make sure that all of the same events for the same user, etc... end up in the same partition so that they are properly ordered, you may do so. The producer is in charge of the partition that they write to, so if you are able to logically partition your data this may be preferable.

I think there are two things in this question which are not similar, consumption order and processing order.
Message Queues can -to a degree- give you a guarantee that messages will get consumed in order, they can't, however, give you any guarantees on the order of their processing.
The main difference here is that there are some aspects of message processing which cannot be determined at consumption time, for example:
As mentioned a consumer can fail while processing, here the message's consumption order was correct, however, the consumer failed to process it correctly, which will make it go back to the queue. At this point the consumption order is intact, but the processing order is not.
If by "processing" we mean that the message is now discarded and finished processing completely, then consider the case when your processing time is not linear, in other words processing one message takes longer than the other. For example, if message 3 takes longer to process than usual, then messages 4 and 5 might get consumed and finish processing before message 3 does.
So even if you managed to get the message back to the front of the queue (which by the way violates the consumption order) you still cannot guarantee they will also be processed in order.
If you want to process the messages in order:
Have only 1 consumer instance at all times, or a main consumer and several stand-by consumers.
Or don't use a messaging queue and do the processing in a synchronous blocking method, which might sound bad but in many cases and business requirements it is completely valid and sometimes even mission critical.

There are proper ways to guarantuee the order of messages within RabbitMQ subscriptions.
If you use multiple consumers, they will process the message using a shared ExecutorService. See also ConnectionFactory.setSharedExecutor(...). You could set a Executors.newSingleThreadExecutor().
If you use one Consumer with a single queue, you can bind this queue using multiple bindingKeys (they may have wildcards). The messages will be placed into the queue in the same order that they were received by the message broker.
For example you have a single publisher that publishes messages where the order is important:
try (Connection connection2 = factory.newConnection();
Channel channel2 = connection.createChannel()) {
// publish messages alternating to two different topics
for (int i = 0; i < messageCount; i++) {
final String routingKey = i % 2 == 0 ? routingEven : routingOdd;
channel2.basicPublish(exchange, routingKey, null, ("Hello" + i).getBytes(UTF_8));
}
}
You now might want to receive messages from both topics in a queue in the same order that they were published:
// declare a queue for the consumer
final String queueName = channel.queueDeclare().getQueue();
// we bind to queue with the two different routingKeys
final String routingEven = "even";
final String routingOdd = "odd";
channel.queueBind(queueName, exchange, routingEven);
channel.queueBind(queueName, exchange, routingOdd);
channel.basicConsume(queueName, true, new DefaultConsumer(channel) { ... });
The Consumer will now receive the messages in the order that they were published, regardless of the fact that you used different topics.
There are some good 5-Minute Tutorials in the RabbitMQ documentation that might be helpful:
https://www.rabbitmq.com/tutorials/tutorial-five-java.html

Looking for message bus implementations that offer something between full ACID and nothing

Anyone know of a message bus implementation which offers granular control over consistency guarantees? Full ACID is too slow and no ACID is too wrong.
We're currently using Rhino ESB wrapping MSMQ for our messaging. When using durable, transactional messaging with distributed transactions, MSMQ can block the commit for considerable time while it waits on I/O completion.
Our messages fall into two general categories: business logic and denormalisation. The latter account for a significant percentage of message bus traffic.
Business logic messages require the guarantees of full ACID and MSMQ has proven quite adequate for this.
Denormalisation messages:
MUST be durable.
MUST NOT be processed until after the originating transaction completes.
MAY be processed multiple times.
MAY be processed even if the originating transaction rolls back, as long as 2) is adhered to.
(In some specific cases the durability requirements could probably be relaxed, but identifying and handling those cases as exceptions to the rule adds complexity.)
All denormalisation messages are handled in-process so there is no need for IPC.
If the process is restarted, all transactions may be assumed to have completed (committed or rolled back) and all denormalisation messages not yet processed must be recovered. It is acceptable to replay denormalisation messages which were already processed.
As far as I can tell, messaging systems which deal with transactions tend to offer a choice between full ACID or nothing, and ACID carries a performance penalty. We're seeing calls to TransactionScope#Commit() taking as long as a few hundred milliseconds in some cases depending on the number of messages sent.
Using a non-transactional message queue causes messages to be processed before their originating transaction completes, resulting in consistency problems.
Another part of our system which has similar consistency requirements but lower complexity is already using a custom implementation of something akin to a transaction log, and generalising that for this use case is certainly an option, but I'd rather not implement a low-latency, concurrent, durable, transactional messaging system myself if I don't have to :P
In case anyone's wondering, the reason for requiring durability of denormalisation messages is that detecting desyncs and fixing desyncs can be extremely difficult and extremely expensive respectively. People do notice when something's slightly wrong and a page refresh doesn't fix it, so ignoring desyncs isn't an option.

It's not exactly the answer you're looking for, but Jonathan Oliver has written extensively on how to avoid using distributed transactions in messaging and yet maintain transactional integrity:
http://blog.jonathanoliver.com/2011/04/how-i-avoid-two-phase-commit/
http://blog.jonathanoliver.com/2011/03/removing-2pc-two-phase-commit/
http://blog.jonathanoliver.com/2010/04/idempotency-patterns/
Not sure if this helps you but, hey.

It turns out that MSMQ+SQL+DTC don't even offer the consistency guarantees we need. We previously encountered a problem where messages were being processed before the distributed transaction which queued them had been committed to the database, resulting in out-of-date reads. This is a side-effect of using ReadCommitted isolation to consume the queue, since:
Start transaction A.
Update database table in A.
Queue message in A.
Request commit of A.
Message queue commits A
Start transaction B.
Read message in B.
Read database table in B, using ReadCommitted <- gets pre-A data.
Database commits A.
Our requirement is that B's read of the table block on A's commit, which requires Serializable transactions, which carries a performance penalty.
It looks like the normal thing to do is indeed to implement the necessary constraints and guarantees oneself, even though it sounds like reinventing the wheel.
Anyone got any comments on this?

If you want to do this by hand, here is a reliable approach. It satisfies (1) and (2), and it doesn't even need the liberties that you allow in (3) and (4).
Producer (business logic) starts transaction A.
Insert/update whatever into one or more tables.
Insert a corresponding message into PrivateMessageTable (part of the domain, and unshared, if you will). This is what will be distributed.
Commit transaction A. Producer has now simply and reliably performed its writes including the insertion of a message, or rolled everything back.
Dedicated distributer job queries a batch of unprocessed messages from PrivateMessageTable.
Distributer starts transaction B.
Mark the unprocessed messages as processed, rolling back if the number of rows modified is different than expected (two instances running at the same time?).
Insert a public representation of the messages into PublicMessageTable (a publically exposed table, in whatever way). Assign new, strictly sequential Ids to the public representations. Because only one process is doing these inserts, this can be guaranteed. Note that the table must be on the same host to avoid 2PC.
Commit transaction B. Distributor has now distributed each message to the public table exactly once, with strictly sequantial Ids.
A consumer (there can be several) queries the next batch of messages from PublicMessageTable with Id greater than its own LastSeenId.
Consumer starts transaction C.
Consumer inserts its own representation of the messages into its own table ConsumerMessageTable (thus advancing LastSeenId). Insert-ignore can help protect against multiple instances running. Note that this table can be in a completely different server.
Commit transaction C. Consumer has now consumed each message exactly once, in the same order the messages were made publically available, without ever skipping a message.
We can do whatever we want based on the consumed messages.
Of course, this requires very careful implementation.
It is even suitable for database clusters, as long as there is only a single write node, and both reads and writes perform causality checks. It may well be that having one of these is sufficient, but I'd have to consider the implications more carefully to make that claim.