I am trying out ActiveMQ Artemis for a messaging design. I am expecting messages with embedded file content (bytes). I do not expect them to be any bigger than 10MB. However, I want to know if there is a configurable way to handle that in Artemis.
Also is there a default maximum message size it supports?
I tried and searched for an answer but could not find any.
Also, my producer and consumer are both .Net AMQP implementations.
ActiveMQ Artemis itself doesn't place a limit on the size of the message. It supports arbitrarily large messages. However, you will be constrained by a few things:
The broker's heap space: If the client sends the message all in one chunk and that causes the broker to exceed it's available heap space then sending the message will fail. The broker has no control over how the AMQP client sends the message. I believe AMQP supports sending messages in chunks but I'm not 100% certain of that.
The broker's disk space: AMQP messages that are deemed "large" by the broker (i.e. those that cannot fit into a single journal file) will be stored directly on disk in the data/largemessages directory. The ActiveMQ Artemis journal file size is controlled by the journal-file-size configuration parameter in broker.xml. The default journal-file-size is 10MB. By default the broker will stop providing credits to the producer when disk space utilization hits 90%. This is controlled by the max-disk-usage configuration parameter in broker.xml.
Related
When reading artemis docs understood that - artemis stores entire current active messages in memory and can offload messages to paging area for a given queue/topic as per the settings & artemis journals are append only.
With respect to this
How and when broker sync messages to and from from journal ( Only during restart ? )
How it identifies the message to be deleted from journal ( For ex : If journal is append only mode , if a consumer of a persistent message ACK the message , then how broker removes a single message from journal without keeping indexing).
Isn't it a performance hit to keep every active message in memory and even makes broker go out of memory. To avoid this , every queue/topic pagination settings have to be set in configuration otherwise broker may fill all the messages. Please correct me if wrong.
Any reference link that can explain about message sync and these information is helpful. Artemis docs explains regarding append only mode though but may be any section/article that explains these storage concepts and I might be missing.
By default, a durable message is persisted to disk after the broker receives it and before the broker sends a response back to the client that the message was received. In this way the client can know for sure that if it receives the response back from the broker that the durable message it sent was received and persisted to disk.
When using the NIO journal-type in broker.xml (i.e. the default configuration), data is synced to disk using java.nio.channels.FileChannel.force(boolean).
Since the journal is append-only during normal operation then when a message is acknowledged it is not actually deleted from the journal. The broker simply appends a delete record to the journal for that particular message. The message will then be physically removed from the journal later during "compaction". This process is controlled by the journal-compact-min-files & journal-compact-percentage parameters in broker.xml. See the documentation for more details on that.
Keeping message data in memory actually improves performance dramatically vs. evicting it from memory and then having to read it back from disk later. As you note, this can lead to memory consumption problems which is why the broker supports paging, blocking, etc. The main thing to keep in mind is that a message broker is not a storage medium like a database. Paging is a palliative measure meant to be used as a last resort to keep the broker functioning. Ideally the broker should be configured to handle the expected load without paging (e.g. acquire more RAM, allocate more heap). In other words, message production and message consumption should be balanced. The broker is designed for messages to flow through it. It can certainly buffer messages (potentially millions depending on the configuration & hardware) but when its forced to page the performance will drop substantially simply because disk is orders of magnitude slower than RAM.
I am using ActiveMQ Artemis 2.11.0 with the native, file-based persistence configuration. Most of the content of my messages is around 5K characters. However, the "Persistent Size" of the message is often 7 to 10 times larger than that.
The message data is a JSON string of various sizes. The "Persistent Size" value that I'm seeing comes from the queue browsing feature of the ActiveMQ Artemis web console. For example, a message with a display body character count of 4553 characters has a persistent size of 26,930 bytes. The persisted record is 6 times larger than the message itself.
There are headers in the message, but not enough I think to account for the difference in message and persisted record size.
Can anyone please tell me why this is and whether or not I can do something to reduce the persisted size of messages?
Here's a related screenshot of the web console:
I hve enabled snappy compression on producer side with a batch size of 64kb, and processing messages of 1 kb each and setting linger time to inf, does this mean till i process 64 messages, producer wont send the messages to kafka out topic...
In other words, will producer send each message to kafka or wait for 64 messages and send them in a single batch...
Cause the offsets are increasing one by one rather than in the multiple of 64
Edit - using flink-kafka connectors
Messages are batched by producer so that the network usage is minimized not to be written "as a batch" into Kafka's commitlog. What you are seeing is correctly done by Kafka as each message needs to be accounted for i.e. identified key / partition relationship, appended to the commitlog and then offset is incremented. Unless the first two steps are done, offset is not incremented.
Also there is data replication to be taken care of based on configurations as well as message tracking systems get updated for each message received (to support lag apis).
Also do note, the batch.size parameter considers ready to ship message's size, which has been pre-processed as 1. compressed 2. serialized by your favorite serializer.
After implementation of gzip compression, whether messages stored earlier will aslo get compressed? And while sending messages to consumer whether Message content is changed or kafka internally uncompresses it?
If you turn on Broker side compression, existing messages are unchanged. Compression will apply to only new messages. When consumers fetch the data, it will be automatically decompressed so you don't have to handle it on the consumer side. Just remember, there's a CPU and latency cost by doing this type of compression potentially.
We are using kafka 0.10.x, I am looking, if there is a way to stop a publisher kafka to stop sending messages after certain messages/limit is reached in an hour. The goal here is to restrict user to only send certain number messages in and hour/day ?
If anyone has come across similar use case, please share your findings.
Thanks in Advance......
Kafka has a few throttling and quota mechanisms but none of them exactly match your requirement to strictly limit a producer based on message count on a daily basis.
From the Apache Kafka 0.11.0.0 documentation at https://kafka.apache.org/documentation/#design_quotas
Kafka cluster has the ability to enforce quotas on requests to control
the broker resources used by clients. Two types of client quotas can
be enforced by Kafka brokers for each group of clients sharing a
quota:
Network bandwidth quotas define byte-rate thresholds (since 0.9)
Request rate quotas define CPU utilization thresholds as a percentage
of network and I/O threads (since 0.11)
Client quotas were first introduced in Kafka 0.9.0.0. Rate limits on producers and consumers are enforced to prevent clients saturating the network or monopolizing broker resources.
See KIP-13 for details: https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
The quota mechanism introduced on 0.9 was based on the client.id set in the client configuration, which can be changed easily. Ideally, quota should be set on the authenticated user name so it is not easy to circumvent so in 0.10.1.0 an addition Authenticated Quota feature was added.
See KIP-55 for details: https://cwiki.apache.org/confluence/display/KAFKA/KIP-55%3A+Secure+Quotas+for+Authenticated+Users
Both the quota mechanisms described above work on data volume (i.e. bandwidth throttling) and not on number of messages nor number of requests. If a client sends lots of small messages or makes lots of requests that return no messages (e.g., a consumer with min.byte configured to 0), it can still overwhelm the broker. To address this issue 0.11.0.0 added in additionally support for throttling by request rate.
See KIP-124 for details: https://cwiki.apache.org/confluence/display/KAFKA/KIP-124+-+Request+rate+quotas
With all that as background then, if you know that your producer always publishes messages of a certain size, then you can compute a daily limit expressed in MB and also a rate limit expressed in MB/sec which you can configure as a quota. That's not a perfect fit for your need because a producer might send nothing for 12 hours and then try and send at a faster rate for a short time and the quota would still limit them to a lower publish rate because the limit is enforced per second and not per day.
If you don't know the message size or it varies a lot then since messages are published using a produce request, you could use request rate throttling to somewhat control the rate that an authenticated user is allow to publish messages but again it would not be a message/day limit nor even a bandwidth limit but rather as a "CPU utilization threshold as a percentage of network and I/O threads". This helps more for avoiding DoS problems and not really for limiting message counts.
If you would like to see message count quotas or message storage quotas added to Kafka then clearly the Kafka Improvement Proposal (KIP) process works and you are encouraged to submit improvement proposals in this or any other area.
See KIP process for details: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
you can make use of broker configs:
message.max.bytes (default:1000000) – Maximum size of a message the broker will accept. This has to be smaller than the consumer fetch.message.max.bytes, or the broker will have messages that can’t be consumed, causing consumers to hang.
log.segment.bytes (default: 1GB) – size of a Kafka data file. Make sure its larger than 1 message. Default should be fine (i.e. large messages probably shouldn’t exceed 1GB in any case. Its a messaging system, not a file system)
replica.fetch.max.bytes (default: 1MB) – Maximum size of data that a broker can replicate. This has to be larger than message.max.bytes, or
a broker will accept messages and fail to replicate them. Leading to potential data loss.
I think you can tweak the config to do what you want