I am trying to understand how durable subscription works in ActiveMQ Artemis. Currently my biggest question is about the storage.
I want to know if the messages are duplicated, which means for each consumer, the message is stored to disk or if the messages are stored in one place and consumers only knows the message at which they were disconnected and need to resume.
From my tests, i can see that : with a fixed message size and number of messages published, the space taken on disk is the same, no matter if I have 1,2 or 3 durable subscription. I took care of disconnecting them so that the messages are stored, don't worry. This would lead me to think that the queues only know about the index of the message that the consumer will need to start to consume when he comes back.
But on the opposite, i checked write Bytes per secondes with iostat cmd, and the more i have durable subscription queues, the more this stat grows. Which would mean messages are duplicated.
Do you guys have more informations, even the source code that is responsible for this.
By default ActiveMQ Artemis will store durable messages in a set of local files known as a "journal." In situations where more than one queue has the same message (e.g. in the case of a multiple durable subscriptions on the same JMS topic) the actual message data is only stored once and each queue gets a "reference" to that message. Storing the exact same message data multiple times would be hugely inefficient.
However, it's worth noting that the ActiveMQ Artemis journal files are initialized with zeroes when they are created which means that even an "empty" journal takes up space on the disk. Therefore, when messages are stored in the journal the amount of disk space they take will not change. The existing zeroes will simply be overwritten with the message data.
You can find the source code for ActiveMQ Artemis on GitHub.
If you want to see proof of this behavior you can use the artemis data print command. This command prints the raw records from the journal in a human readable format. If you were to have 2 durable subscriptions using client IDs of durable-client1 and durable-client2 and subscription names of subscriber-1 and subscriber-2 respectively on a JMS topic named exampleTopic and you send one message you would end up with an address, 2 queues, 1 actual message, and 2 references in the journal. You'd see something like this in the data print command output:
********************************************
B I N D I N G S J O U R N A L
********************************************
...
### Surviving Records Summary ###
...
recordID=2;userRecordType=44;isUpdate=false;compactCount=0;PersistentAddressBindingEncoding [id=2, name=exampleTopic, routingTypes={MULTICAST}, autoCreated=false]
recordID=18;userRecordType=21;isUpdate=false;compactCount=0;PersistentQueueBindingEncoding [id=18, name=durable-client1.subscriber-1, address=exampleTopic, filterString=null, user=null, autoCreated=false, maxConsumers=-1, purgeOnNoConsumers=false, exclusive=false, lastValue=false, lastValueKey=null, nonDestructive=false, consumersBeforeDispatch=0, delayBeforeDispatch=-1, routingType=0, configurationManaged=false, groupRebalance=false, groupBuckets=-1, groupFirstKey=null, autoDelete=false, autoDeleteDelay=0, autoDeleteMessageCount=0]
recordID=23;userRecordType=21;isUpdate=false;compactCount=0;PersistentQueueBindingEncoding [id=23, name=durable-client1.subscriber-2, address=exampleTopic, filterString=null, user=null, autoCreated=false, maxConsumers=-1, purgeOnNoConsumers=false, exclusive=false, lastValue=false, lastValueKey=null, nonDestructive=false, consumersBeforeDispatch=0, delayBeforeDispatch=-1, routingType=0, configurationManaged=false, groupRebalance=false, groupBuckets=-1, groupFirstKey=null, autoDelete=false, autoDeleteDelay=0, autoDeleteMessageCount=0]
...
********************************************
M E S S A G E S J O U R N A L
********************************************
...
### Surviving Records Summary ###
recordID=27;userRecordType=45;isUpdate=false;compactCount=0;Message(messageID=27;userMessageID=41705395-b2d1-11e9-91f9-a0afbd82eaba;msg=CoreMessage[messageID=27,durable=true,userID=41705395-b2d1-11e9-91f9-a0afbd82eaba,priority=4, timestamp=Tue Jul 30 08:52:22 CDT 2019,expiration=0, durable=true, address=exampleTopic,size=232,properties=TypedProperties[__AMQ_CID=durable-client1,_AMQ_ROUTING_TYPE=0]]#454305524
recordID=27;userRecordType=32;isUpdate=true;compactCount=0;AddRef;QueueEncoding [queueID=18]
recordID=27;userRecordType=32;isUpdate=true;compactCount=0;AddRef;QueueEncoding [queueID=23]
...
Related
I'm working on a generic CQRS + ES framework (with nodejs) in the company. Remark: Only RDBMS + Redis (without AOF/RDB persistence) is allowed due to some reasons.
I really need some advices on how to implement the CQRS + ES framework....
Ignoring the ES part, I'm struggling with the implementation on the message propagation.
Here is the tables I have in the RDBMS.
EventStore: [aggregateId (varchar), aggregateType (varchar), aggregateVersion (bigint), messageId (varchar), messageData (varchar), messageMetadata (varchar), sequenceNumber (bigint)]
EventDelivery: [messageId (varchar, foreign key to EventStore), sequenceId (equal to aggregateId, varchar), sequenceNumber (equal to the one in EventStore, bigint)]
ConsumerGroup: [consumerGroup (varchar), lastSequenceNumberSeen (bigint)]
And I have multiple EventSubscriber
// In Application 1
#EventSubscriber("consumerGroup1", AccountOpenedEvent)
...
// In Application 2
#EventSubscriber("consumerGroup2", AccountOpenedEvent)
...
Here is the the flow when an AccountOpenedEvent is written to EventStore table.
For each application (i.e application 1 and application 2), it will scan the codebase to obtain all the #EventSubscriber, create a consumer group in ConsumerGroup table with lastSequeneNumberSeen = 0, then having a scheduler (with 100ms polling interval) to poll all the interested events (group by consumer group) in EventStore with condition sequeneNumber >= lastSequeneNumberSeen.
For each event (EventStore) in step 1, calculate the sequenceId (here the sequenceId is equal to aggregateId), this sequenceId (together with the sequenceNumber) is used to guarantee the message delivery ordering. Persist it into EventDelivery table, and update the lastSequeneNumberSeen = sequenceNumber (this is to prevent duplicate event being scanned in next interval).
For each application (i.e application 1 and application 2), we have another scheduler (also with 100ms polling interval) to poll the EventDelivery table (group by seqeunceId and order by sequenceNumber ASC).
For each event (EventDelivery) in step 3, call the corresponding message handler, after message is handled, acknowledge the message by deleting the record in EventDelivery.
Since I have 2 applications, I have to separate the AccountOpenedEvent in EventStore into 2 transactions, supposing 2 applications don't know each other, I can only do it passively. Thats why I need the EventDelivery table and polling scheduler.
Assuming I can use redlock + cron to make sure there is only 1 instance do the polling jobs, in case application 1 have more than 1 replicas.
Application 1 will poll the AccountOpenedEvent and create a record in EventDelivery, and store the lastSequenceNumberSeen in its consumer group.
Application 2 will also poll the AccountOpenedEvent and create a record in EventDelivery and store the lastSequenceNumberSeen in its consumer group.
Since application 1 and application 2 are different consumer group, they treat the event store stream separately.
Here is a problem, we have 2 schedulers and we would have more if there are more consumer group, these will make heavy traffic loads to the database. How to solve this? One of my solution is convert these 2 schedulers to a job and put these jobs into queue, the queue will handle the jobs per interval (lets say 100ms), but seems like this would introduce large latency if the job is unfortunately placed at the end of the queue.
Here is the 2nd problem, in the above flow, I introduced the 2nd polling job to guarantee the message delivery ordering. But unlike the first one, I don't have the lastSequenceNumberSeen, the 2nd polling job will remove the job in EventDelivery if the message is handled. But it is common a message would be handled over 100ms. If thats in case, the same event in EventDelivery will be scanned again.
I'm not sure the common practice. I'm quite struggling on how to implement this. I did lots of research on the internet. I see some of them implement the message propagation by using Debezium + Kafka (Although I cannot use these 2 tools, I still cannot understand how it works).
I know Debezium using CDC approach to tail the transaction logs of RDBMS and forward the message to Kafka. And I see some recommendations that we should not have multiple subscription on the same transaction log. Let's say Debezium guaranteed the event can be propagated to Kafka, it means I need applciation 1 and applciation 2 subscribe the Kafka topic, both should belongs to different consumer group (also use aggregateId as partition key). Since Kafka guaranteed the message ordering, everything should work fine. But I don't think Kafka would store all the message from the most beginning, lets say it is configured to store 1000000 messages, when the message handler keep failed due to unexpected reason, the 1000000 messages after this failed message cannot be handled, the 1000001th event will get lost... Although this is rare case, I'm not sure I understand it right or not, the database table is the most reliable source to trust as it store all the events from the most beginning, if the system suffer from this case, is that mean I need to manually republish all the events to Kafka to recover the projection model?
And other case, if I have new event subscriber, which need to historical events to build the projection model. With Debezium + Kafka, we need assign a new consumerGroup and configured it to read the Kafka stream from the most beginning? It has the same problem as the consumerGroup can only get the last 1000000 events... But this is not a case if we poll the database table directly instead.
I don't understand why most implementation doesn't poll the database table but make use of message broker.
And, I really need advice on how to implement a CQRS + ES framework.... especially the message propagation part (keep in mind I can only use RDBMS + Redis(without persistence))....
I am working with a microservice that consumes messages from Kafka. It does some processing on the message and then inserts the result in a database. Only then am I acknowledging the message with Kafka.
It is required that I keep data loss to an absolute minimum but recovery rate is quick (avoid reprocessing message because it is expensive).
I realized that if there was to be some kind of failure, like my microservice would crash, my messages would be reprocessed. So I thought to add some kind of 'checkpoint' to my process by writing the state of the transformed message to the file and reading from it after a failure. I thought this would mean that I could move my Kafka commit to an earlier stage, only after writing to the file is successful.
But then, upon further thinking, I realized that if there was to be a failure on the file system, I might not find my files e.g. using a cloud file service might still have a chance of failure even if the marketed rate is that of >99% availability. I might end up in an inconsistent state where I have data in my Kafka topic (which is unaccessible because the Kafka offset has been committed) but I have lost my file on the file system. This made me realize that I should send the Kafka commit at a later stage.
So now, considering the above two design decisions, it feels like there is a tradeoff between not missing data and minimizing time to recover from failure. Am I being unrealistic in my concerns? Is there some design pattern that I can follow to minimize the tradeoffs? How do I reason about this situation? Here I thought that maybe the Saga pattern is appropriate, but am I overcomplicating things?
If you are that concerned of data reprocess, you could always follow the paradigm of sending the offsets out of kafka.
For example, in your consumer-worker reading loop:
(pseudocode)
while(...)
{
MessageAndOffset = getMsg();
//do your things
saveOffsetInQueueToDB(offset);
}
saveOffsetInQueueToDB is responsible of adding the offset to a Queue/List, or whatever. This operation is only done one the message has been correctly processed.
Periodically, when a certain number of offsets are stored, or when shutdown is captured, you could implement another function that stores the offsets for each topic/partition in:
An external database.
An external SLA backed storing system, such as S3 or Azure Blobs.
Internal (disk) and remote loggers.
If you are concerned about failures, you could use a combination of two of those three options (or even use all three).
Storing these in a "memory buffer" allows the operation to be async, so there's no need for a new transfer/connection to the database/datalake/log for each processed message.
If there's a crash, you could read all messages from the beginning (easiest way is just changing the group.id and setting from beginning) but discarding those whose offset is included in the database, avoiding the reprocess. For example by adding a condition in your loop (yep pseudocode again):
while(...)
{
MessageAndOffset = getMsg();
if (offset.notIncluded(offsetListFromDB))
{
//do your things
saveOffsetInQueueToDB(offset);
}
}
You could implement better performant algorithms instead a "non-included" type one, just storing the last read offsets for each partition in a HashMap and then just checking if the partition that belongs to each consumer is bigger or not than the stored one. For example, partition 0's last offset was 558 and partitions 1's 600:
//offsetMap = {[0,558],[1,600]}
while(...)
{
MessageAndOffset = getMsg();
//get partition => 0
if (offset > offsetMap.get(partition))
{
//do your things
saveOffsetInQueueToDB(offset);
}
}
This way, you guarantee that only the non-processed messages from each partition will be processed.
Regarding file system failures, that's why Kafka comes as a cluster: Fault tolerance in Kafka is done by copying the partition data to other brokers which are known as replicas.
So if you have 5 brokers, for example, you must experience a total of 5 different system failures at the same time (I guess brokers are in separate hosts) in order to lose any data. Even 4 different brokers could fail at the same time without losing any data.
All brokers save the same amount of data, same partitions. If a filesystem error occurs in one of the brokers, the others will still hold all the information:
And so, I have some missing messages from Apache Active MQ Artemis (for more information my previous question is located here Apache ActiveMQ Artemis how to investigate if messages were lost?).
After reviewing journal records I see these entries related to the lost message (recordID=1094612593). What can I deduce from these entries. And can they be helpful in further troubleshooting?
operation#AddRecordTX;txID=1094612560,recordID=1094612593;userRecordType=45;isUpdate=false;compactCount=0;Message(messageID=1094612593;userMessageID=aa844c3f-1c6e-11ec-840c-005056be4b8b;msg=CoreMessage[messageID=1094612593,durable=true,userID=aa844c3f-1c6e-11ec-840c-005056be4b8b,priority=4, timestamp=Thu Sep 23 16:03:37 EEST 2021,expiration=0, durable=true, address=[===myQueue===],size=1277,properties=TypedProperties[===PROPERTIES==]]#1335046207
operation#UpdateTX;txID=1094612540,recordID=1094612663;userRecordType=32;isUpdate=true;compactCount=0;AddRef;QueueEncoding [queueID=7]
operation#UpdateTX;txID=1094612655,recordID=1094612663;userRecordType=33;isUpdate=true;compactCount=0;ACK;QueueEncoding [queueID=7]
operation#DeleteRecord;recordID=1094612663
P.s
I've tried to reproduce the loss situation, but to no avail.
The data here is inconclusive as the records don't directly relate to one another. Let's look at each record one by one...
operation#AddRecordTX;txID=1094612560,recordID=1094612593;userRecordType=45;isUpdate=false;compactCount=0;Message(messageID=1094612593;userMessageID=aa844c3f-1c6e-11ec-840c-005056be4b8b;msg=CoreMessage[messageID=1094612593,durable=true,userID=aa844c3f-1c6e-11ec-840c-005056be4b8b,priority=4, timestamp=Thu Sep 23 16:03:37 EEST 2021,expiration=0, durable=true, address=[===myQueue===],size=1277,properties=TypedProperties[===PROPERTIES==]]#1335046207
This is an "add message" record that contains the actual message data (i.e. the body and the properties & headers). The ID for this record (i.e. 1094612593) will be reference by other records which relate to this message.
operation#UpdateTX;txID=1094612540,recordID=1094612663;userRecordType=32;isUpdate=true;compactCount=0;AddRef;QueueEncoding [queueID=7]
This is an "add ref" record. Since a single message can actually be on multiple queues (e.g. in several subscriptions on a JMS topic) the message data isn't duplicated each time. Instead a "ref" is added to each queue (i.e. queueID=7 in this case), and each "ref" points back to the ID of the actual message (i.e. 1094612663 here). In this case 1094612663 doesn't match the "add message" record ID of 1094612593 so these journal entries are related to 2 different messages.
operation#UpdateTX;txID=1094612655,recordID=1094612663;userRecordType=33;isUpdate=true;compactCount=0;ACK;QueueEncoding [queueID=7]
This is an "ack" record which indicates that a message was acknowledged. Messages can be acknowledged by a client (e.g. during normal consumption) or they can be acknowledged administratively (e.g. during a delete operation via the management API).
operation#DeleteRecord;recordID=1094612663
This is a "delete" record which is added to the journal once all the "refs" of a message have been acknowledged. The recordID refers back to the original "add message" record.
Later during a process called "compaction" all of the delete records will be cleaned up along with the records they reference including the ref and ack records. In this way usable space within the journal file can be freed and re-used.
[org.jgroups.protocols.pbcast.NAKACK] (requester=, local_addr=) message ::port not found in retransmission table of :port:
(size=xxxx, missing=x, highest stability=xxxxx)]
NAKACK (or its newer cousin, NAKACK2) provide reliable transmission of messages to the cluster. To do this, every messages gets a sequence number (seqno) and receivers deliver the message to the application in seqno order.
Every cluster member has a table of all other members and their messages (conceptually a list). When member P sends messages P21, P22 and P23, a receiver R first looks up the message list for R, then adds P21-P23 to the list.
However, in your case, the list for R was not found. This means that R was not a cluster member (anymore).
For example, if we have cluster {P,Q,R,T}, and member R leaves or is excluded because it was suspected (e.g. we didn't receive a heartbeat for a period of time), then messages P21-23 will be dropped by any receiver.
This is because JGroups only allows cluster members to send and receive messages.
How can a member get excluded?
This is likely done by on of the failure detection protocols (e.g. FD_ALL or FD).
Another possibility is that your thread pools were clogged and failure detection heartbeat messages were dropped, leading to false suspicions.
Also, long GC pauses can cause this.
Fixes:
Increase the timeouts in FD_ALL or FD. The timeout should be longer than the longest GC cycle. Note that it will now take longer to detect hung members.
Size your thread pools, e.g. make sure that the max number of threads are big and the queue is disabled.
Note that false suspicions can happen, but MERGE3 should rememdy a split cluster later on.
I need to choose a new Queue broker for my new project.
This time I need a scalable queue that supports pub/sub, and keeping message ordering is a must.
I read Alexis comment: He writes:
"Indeed, we think RabbitMQ provides stronger ordering than Kafka"
I read the message ordering section in rabbitmq docs:
"Messages can be returned to the queue using AMQP methods that feature
a requeue
parameter (basic.recover, basic.reject and basic.nack), or due to a channel
closing while holding unacknowledged messages...With release 2.7.0 and later
it is still possible for individual consumers to observe messages out of
order if the queue has multiple subscribers. This is due to the actions of
other subscribers who may requeue messages. From the perspective of the queue
the messages are always held in the publication order."
If I need to handle messages by their order, I can only use rabbitMQ with an exclusive queue to each consumer?
Is RabbitMQ still considered a good solution for ordered message queuing?
Well, let's take a closer look at the scenario you are describing above. I think it's important to paste the documentation immediately prior to the snippet in your question to provide context:
Section 4.7 of the AMQP 0-9-1 core specification explains the
conditions under which ordering is guaranteed: messages published in
one channel, passing through one exchange and one queue and one
outgoing channel will be received in the same order that they were
sent. RabbitMQ offers stronger guarantees since release 2.7.0.
Messages can be returned to the queue using AMQP methods that feature
a requeue parameter (basic.recover, basic.reject and basic.nack), or
due to a channel closing while holding unacknowledged messages. Any of
these scenarios caused messages to be requeued at the back of the
queue for RabbitMQ releases earlier than 2.7.0. From RabbitMQ release
2.7.0, messages are always held in the queue in publication order, even in the presence of requeueing or channel closure. (emphasis added)
So, it is clear that RabbitMQ, from 2.7.0 onward, is making a rather drastic improvement over the original AMQP specification with regard to message ordering.
With multiple (parallel) consumers, order of processing cannot be guaranteed.
The third paragraph (pasted in the question) goes on to give a disclaimer, which I will paraphrase: "if you have multiple processors in the queue, there is no longer a guarantee that messages will be processed in order." All they are saying here is that RabbitMQ cannot defy the laws of mathematics.
Consider a line of customers at a bank. This particular bank prides itself on helping customers in the order they came into the bank. Customers line up in a queue, and are served by the next of 3 available tellers.
This morning, it so happened that all three tellers became available at the same time, and the next 3 customers approached. Suddenly, the first of the three tellers became violently ill, and could not finish serving the first customer in the line. By the time this happened, teller 2 had finished with customer 2 and teller 3 had already begun to serve customer 3.
Now, one of two things can happen. (1) The first customer in line can go back to the head of the line or (2) the first customer can pre-empt the third customer, causing that teller to stop working on the third customer and start working on the first. This type of pre-emption logic is not supported by RabbitMQ, nor any other message broker that I'm aware of. In either case, the first customer actually does not end up getting helped first - the second customer does, being lucky enough to get a good, fast teller off the bat. The only way to guarantee customers are helped in order is to have one teller helping customers one at a time, which will cause major customer service issues for the bank.
It is not possible to ensure that messages get handled in order in every possible case, given that you have multiple consumers. It doesn't matter if you have multiple queues, multiple exclusive consumers, different brokers, etc. - there is no way to guarantee a priori that messages are answered in order with multiple consumers. But RabbitMQ will make a best-effort.
Message ordering is preserved in Kafka, but only within partitions rather than globally. If your data need both global ordering and partitions, this does make things difficult. However, if you just need to make sure that all of the same events for the same user, etc... end up in the same partition so that they are properly ordered, you may do so. The producer is in charge of the partition that they write to, so if you are able to logically partition your data this may be preferable.
I think there are two things in this question which are not similar, consumption order and processing order.
Message Queues can -to a degree- give you a guarantee that messages will get consumed in order, they can't, however, give you any guarantees on the order of their processing.
The main difference here is that there are some aspects of message processing which cannot be determined at consumption time, for example:
As mentioned a consumer can fail while processing, here the message's consumption order was correct, however, the consumer failed to process it correctly, which will make it go back to the queue. At this point the consumption order is intact, but the processing order is not.
If by "processing" we mean that the message is now discarded and finished processing completely, then consider the case when your processing time is not linear, in other words processing one message takes longer than the other. For example, if message 3 takes longer to process than usual, then messages 4 and 5 might get consumed and finish processing before message 3 does.
So even if you managed to get the message back to the front of the queue (which by the way violates the consumption order) you still cannot guarantee they will also be processed in order.
If you want to process the messages in order:
Have only 1 consumer instance at all times, or a main consumer and several stand-by consumers.
Or don't use a messaging queue and do the processing in a synchronous blocking method, which might sound bad but in many cases and business requirements it is completely valid and sometimes even mission critical.
There are proper ways to guarantuee the order of messages within RabbitMQ subscriptions.
If you use multiple consumers, they will process the message using a shared ExecutorService. See also ConnectionFactory.setSharedExecutor(...). You could set a Executors.newSingleThreadExecutor().
If you use one Consumer with a single queue, you can bind this queue using multiple bindingKeys (they may have wildcards). The messages will be placed into the queue in the same order that they were received by the message broker.
For example you have a single publisher that publishes messages where the order is important:
try (Connection connection2 = factory.newConnection();
Channel channel2 = connection.createChannel()) {
// publish messages alternating to two different topics
for (int i = 0; i < messageCount; i++) {
final String routingKey = i % 2 == 0 ? routingEven : routingOdd;
channel2.basicPublish(exchange, routingKey, null, ("Hello" + i).getBytes(UTF_8));
}
}
You now might want to receive messages from both topics in a queue in the same order that they were published:
// declare a queue for the consumer
final String queueName = channel.queueDeclare().getQueue();
// we bind to queue with the two different routingKeys
final String routingEven = "even";
final String routingOdd = "odd";
channel.queueBind(queueName, exchange, routingEven);
channel.queueBind(queueName, exchange, routingOdd);
channel.basicConsume(queueName, true, new DefaultConsumer(channel) { ... });
The Consumer will now receive the messages in the order that they were published, regardless of the fact that you used different topics.
There are some good 5-Minute Tutorials in the RabbitMQ documentation that might be helpful:
https://www.rabbitmq.com/tutorials/tutorial-five-java.html