RabbitMQ Durability - queue

I'm using rabbitMQ on docker.
When executing the rabbitmq, I want to set the message durability (durable/transient).
Is there any way to set up durability? (except when declare Queue and Exchange)

Yes it is possible to specify delivery-mode message attribute for any published message. However, the target queue must be also durable for a message to be persisted.
See chapter Message Attributes and Payload in RabbitMQ documenation:
Messages in the AMQP model have attributes. Some attributes are so
common that the AMQP 0-9-1 specification defines them and application
developers do not have to think about the exact attribute name. Some
examples are
Content type
Content encoding
Routing key
Delivery mode (persistent or not)
Message priority
Message publishing timestamp
Expiration period
Publisher application id
Simply publishing a
message to a durable exchange or the fact that the queue(s) it is
routed to are durable doesn't make a message persistent: it all
depends on persistence mode of the message itself. Publishing messages
as persistent affects performance (just like with data stores,
durability comes at a certain cost in performance).

Related

Message order issue in single consumer connected to ActiveMQ Artemis queue

Any possibility of message order issue while receive single queue consumer and multiple producer?
producer1 publish message m1 at 2021-06-27 02:57:44.513 and producer2 publish message m2 at 2021-06-27 02:57:44.514 on same queue worker_consumer_queue. Client code connected to the queue configured as single consumer should receive message in order m1 first and then m2 correct? Sometimes message receive in wrong order. version is ActiveMQ Artemis 2.17.0.
Even though I mentioned that multiple producer, message publish one after another from same thread using property blockOnDurableSend=false.
I create and close producer on each message publish. On same JVM, my assumption is order of published messages in queue, from same thread or from different threads even with async. timestamp is getJMSTimestamp(). async publish also maintain any internal queue has order?
If you use blockOnDurableSend=false you're basically saying you don't strictly care about the order or even if the message makes it to the broker at all. Using blockOnDurableSend=false basically means "fire and forget."
Furthermore, the JMSTimetamp is not when the message is actually sent as noted in the javax.jms.Message JavaDoc:
The JMSTimestamp header field contains the time a message was handed off to a provider to be sent. It is not the time the message was actually transmitted, because the actual send may occur later due to transactions or other client-side queueing of messages.
With more than one producer there is no guarantee that the messages will be processed in order.
More producers, ActiveMQ Artemis and one consumer are a distributed system and the lack of a global clock is a significant characteristic of distributed systems.
Even if producers and ActiveMQ Artemis were on the same machine and used the same clock, ActiveMQ Artemis could not receive the messages in the same order producers would create and send their messages. Because the time to create a message and the time to send a message include variable time latencies.
The easiest solution is to trust the order of the messages received by ActiveMQ Artemis, adding a timestamp with an interceptor or enabling the ingress timestamp, see ARTEMIS-2919 for further details.
If the easiest solution doesn't work, the distributed solution is to implement a distributed system total ordering algorithm as lamport timestamps.
Well, as it seams it is not a bug within Artemis, when it comes to a millisecond difference it is more like a network lag or something like this.
So to workaround I got to the idea, you could create a algorythm in which a recieved message will wait for ~100ms before it is really worked through (whatever you want to be doing with this message) and check if there is another message which your application recieved afterwards but is send before. So basicly have your own receiver queue with a delay.
IF there is message that was before, you could simply move that up in your personal algorythm. You could also think about to reject the first message back to your bus, depending on your settings on queues and topics it would be able to recieve it afterwards again.

Can two synchronous clients use same request/reply topic

I have a User service which is listening to a request topic and returning the User object. I need to call this service synchronously from two different services and wanted to confirm if its ok for them to use the same request/response topic names to both request the User object?
See the documentation.
There are two options when using the same reply topic:
Discard unexpected replies:
When configuring with a single reply topic, each instance must use a different group.id. In this case, all instances receive each reply, but only the instance that sent the request finds the correlation ID. This may be useful for auto-scaling, but with the overhead of additional network traffic and the small cost of discarding each unwanted reply. When you use this setting, we recommend that you set the template’s sharedReplyTopic to true, which reduces the logging level of unexpected replies to DEBUG instead of the default ERROR.
Use dedicated partitions:
If you have multiple client instances and you do not configure them as discussed in the preceding paragraph, each instance needs a dedicated reply topic. An alternative is to set the KafkaHeaders.REPLY_PARTITION and use a dedicated partition for each instance. The Header contains a four-byte int (big-endian). The server must use this header to route the reply to the correct partition (#KafkaListener does this). In this case, though, the reply container must not use Kafka’s group management feature and must be configured to listen on a fixed partition (by using a TopicPartitionOffset in its ContainerProperties constructor).
Multiple consumers can read and process same message from same topic if both consumers are in different consumer group.

What is Kafka message tweaking?

From https://data-flair.training/blogs/advantages-and-disadvantages-of-kafka/:
As we know, the broker uses certain system calls to deliver messages to the consumer. However, Kafka’s performance reduces significantly if the message needs some tweaking. So, it can perform quite well if the message is unchanged because it uses the capabilities of the system.
How can a message be tweaked? If I want to demonstrate that a message can be tweaked, what do I have to do?
The suspected concern with Kafka performance is mentioned in this statement:
Suspicion:
Kafka’s performance reduces significantly if the message needs some
tweaking. So, it can perform quite well if the message is unchanged
Clarification:
As a user of Kafka for several years, I found Kafka's guarantee,
that a message cannot be altered when it is inside a queue, to be
one of its best features.
Kafka does not allow consumers to directly alter in-flight messages in the queue (topic).
Message content can be altered either before it is published to a topic, or after it is consumed from a topic only.
The business requirement that needs messages to be modified can be implemented using multiple topics, in this pattern:
(message) --> Topic 1 --> (consume & modify message) --> Topic 2
Using multiple topics for implementing the message modification functionality will only increase the storage requirement. It will not have any impact on performance.
Kafka's design of not allowing in-flight modification of messages provides 'Data integrity'. This is one of the driving factors behind the widespread use of Kafka in financial processing applications.
It's not clear what that means. That term is not used in official Kafka documentation, and messages are immutable once sent by a producer.

How preferred is Kafka in Applications in terms of processing time without having Filters as in JMS selectors.

Kafka does not have any concept of filters at the brokers that can ensure – messages that are being picked up by a consumer matches some criteria. The filtering has to happen at the consumers (or applications). - So in this case there is an increase in the processing time at the client/consumer.
In the case of JMS – if your messaging application needs to filter the messages it receives, you can use a JMS API message selector, which allows a message consumer to specify the messages it is interested in. Message selectors assign the work of filtering messages to the JMS provider rather than to the application. - So in this case there is an increase in the processing time at the JMS provider
Which of the above two proves to be better in terms of keeping the code clean and improving performance too?
I don't think your question can be answered conclusively as the performance will ultimately depend on the actual implementation (i.e. which specific JMS provider is used) as well as the particular use-case (e.g. details like the number of clients, the message volume, how many messages match the filters, network speed, etc.).

Reliable fire-n-forget Kafka producer implementation strategy

I'm in middle of a 1st mile problem with Kafka. Everybody deals with partitioning, etc. but how to handle the 1st mile?
My system consists of many applications producing events distributed on nodes. I need to deliver these events to a set of applications acting as consumers in a reliable/fail-safe way. The messaging system of choice is Kafka (due its log nature) but it's not set in stone.
The events should be propagated in a decoupled fire-n-forget manner as most as possible. This means the producers should be fully responsible for reliable delivering their messages. This means apps producing events shouldn't worry about the event delivery at all.
Producer's reliability schema has to account for:
box connection outage - during an outage producer can't access network at all; Kafka cluster is thus not reachable
box restart - both producer and event producing app restart (independently); producer should persist in-flight messages (during retrying, batching, etc.)
internal Kafka exceptions - message size was too large; serialization exception; etc.
No library I've examined so far covers these cases. Is there a suggested strategy how to solve this?
I know there are retriable and non-retriable errors during Producer's send(). On those retriable, the library usually handles everything internally. However, non-retriable ends with an exception in async callback...
Should I blindly replay these to infinity? For network outages it should work but how about Kafka internal errors - say message too large. There might be a DeadLetterQueue-like mechanism + replay. However, how to deal with message count...
About the persistence - a lightweight DB backend should solve this. Just creating a persistent queue and then removing those already send/ACKed. However, I'm afraid that if it was this simple it would be already implemented in standard Kafka libraries long time ago. Performance would probably go south.
Seeing things like KAFKA-3686 or KAFKA-1955 makes me a bit worried.
Thanks in advance.
We have a production system whose primary use case is reliable message delivery. I can't go in much detail, however i can share a high level design on how we achieve this. However this system is guarantees "atleast once delivery" messaging sematics.
Source
First we designed a message schema, and all the message sent to this
system must follow it.
Then we write the message to the a mysql message table, which is sharded by
date, with a field marked as delivered or not
We have a app constantly polling db, with rows marked un-delivered, picks up a row, constructs the message and send it to the load balancer, this is a blocking call and
updates the message row to delivered, only when returned 200
In case of 5xx, the app will retry the message with sleep back off. Also you can make the retries configurable as per your need.
Each source system maintains their own polling app and db.
Producer Array
This is basically a array of machines under a load balancer waiting for incoming messages and produce those to the Kafka Cluster.
We maintain 3 replicas of each topic and in the producer Config we keep acks = -1 , which is very important for your fire-n-forget requirement. As per the doc
acks=all This means the leader will wait for the full set of in-sync
replicas to acknowledge the record. This guarantees that the record
will not be lost as long as at least one in-sync replica remains
alive. This is the strongest available guarantee. This is equivalent
to the acks=-1 setting
As I said producing is a blocking call, and it will return 2xx if the message is produced succesfully across all 3 replicas.
4xx, if message is doesn't meet the schema requirements
5xx, if the kafka broker threw some exception.
Consumer Array
This is a normal array of machines, running Kafka High level Consumers for the topic's consumer groups.
We are currently running this setup with few additional components for some other functional flows in production and it is basically fire-n-forget from the source point of view.
This system addresses all of your concerns.
box connection outage : Unless the source polling app gets 2xx,it
will produce again-again which may lead to duplicates.
box restart : Due to retry mechanism of the source , this shouldn't be a problem as well.
internal Kafka exceptions : Taken care by polling app, as producer array will reply with 5xx unable to produce, and will be further retried.
Acks = -1, also ensures that all the replicas are in-sync and have a copy of the message, so broker going down will not be a issue as well.