divert on large files on hornetq leaves a copy of the file behind - hornetq

We have this queue where we get some large files and many small ones, so we use divert to get the large files onto their own queue.
The divert is exclusive. However see that under the large messages folder there are two copies of the message (which is a bytesmessage - file with properties) for every large message diverted. Once we consume the diverted message one of them goes away, but the other is left behind until hornet is restarted. When it is restarted we see messages like the following:
[org.hornetq.core.server] HQ221018: Large message: 19,327,352,827 did not have any associated reference, file will be deleted
We use streaming over JMS to put them in and get them out.
Below is the divert configuration. By the way, large size is considered everything over 100k in our HornetQ.
Are we missing any thing or did we just discover a bug?
HornetQ version is 2.3.0
<diverts>
<divert name="large-message-divert">
<routing-name>large-message-divert</routing-name>
<address>jms.queue.FileDelivery</address>
<forwarding-address>jms.queue.FileDelivery.large</forwarding-address>
<filter string="_HQ_LARGE_SIZE IS NOT NULL AND _HQ_LARGE_SIZE > 52428800"/>
<exclusive>true</exclusive>
</divert>
</diverts>

This is likely a bug fixed in a later release of hornetq, since there were at least 2 fixes since 2.3.0 to the current date when I wrote this answer:
HORNETQ-1292 - Delete large message from disk when message is dropped
HORNETQ-431 - Large Messages Files NOT DELETED on unbounded address

Related

Mirth - removing segments on its own

We are running Mirth 3.9.0 and noticed that some of our diet order messages are being stripped of segments. For example we create an order which may contain two ORC and ODS segments along with an NTE afterwards. Despite our interface adding the extra segments, once Mirth sends the message the extra segments have been removed. We can test this by calling our API (via Postman) and seeing the resulting message containing the extra segments but viewing the same message in Mirth Connect the segments are missing.
Why would Mirth remove the segments and how can we keep it from doing so?
I discovered that Mirth does not allow two ODC segments in a row. We added code to adjust for that by adding an extra ORC segment before the ODS. We didn't see any improvement and then found out our master branch does not seem to match production. The change allowed it to work in our test branch but not production. This is not a Mirth issue at this point but a deployment issue.
Sorry for wasting anyone's time on this.

Kafka - different configuration settings

I am going through the documentation, and there seems to be there are lot of moving with respect to message processing like exactly once processing , at least once processing . And, the settings scattered here and there. There doesnt seem a single place that documents the properties need to be configured rougly for exactly once processing and atleast once processing.
I know there are many moving parts involved and it always depends . However, like i was mentioning before , what are the settings to be configured atleast to provide exactly once processing and at most once and atleast once ...
You might be interested in the first part of Kafka FAQ that describes some approaches on how to avoid duplication on data production (i.e. on producer side):
Exactly once semantics has two parts: avoiding duplication during data
production and avoiding duplicates during data consumption.
There are two approaches to getting exactly once semantics during data
production:
Use a single-writer per partition and every time you get a network
error check the last message in that partition to see if your last
write succeeded
Include a primary key (UUID or something) in the
message and deduplicate on the consumer.
If you do one of these things, the log that Kafka hosts will be
duplicate-free. However, reading without duplicates depends on some
co-operation from the consumer too. If the consumer is periodically
checkpointing its position then if it fails and restarts it will
restart from the checkpointed position. Thus if the data output and
the checkpoint are not written atomically it will be possible to get
duplicates here as well. This problem is particular to your storage
system. For example, if you are using a database you could commit
these together in a transaction. The HDFS loader Camus that LinkedIn
wrote does something like this for Hadoop loads. The other alternative
that doesn't require a transaction is to store the offset with the
data loaded and deduplicate using the topic/partition/offset
combination.

Avoid Data Loss While Processing Messages from Kafka

Looking out for best approach for designing my Kafka Consumer. Basically I would like to see what is the best way to avoid data loss in case there are any
exception/errors during processing the messages.
My use case is as below.
a) The reason why I am using a SERVICE to process the message is - in future I am planning to write an ERROR PROCESSOR application which would run at the end of the day, which will try to process the failed messages (not all messages, but messages which fails because of any dependencies like parent missing) again.
b) I want to make sure there is zero message loss and so I will save the message to a file in case there are any issues while saving the message to DB.
c) In production environment there can be multiple instances of consumer and services running and so there is high chance that multiple applications try to write to the
same file.
Q-1) Is writing to file the only option to avoid data loss ?
Q-2) If it is the only option, how to make sure multiple applications write to the same file and read at the same time ? Please consider in future once the error processor
is build, it might be reading the messages from the same file while another application is trying to write to the file.
ERROR PROCESSOR - Our source is following a event driven mechanics and there is high chance that some times the dependent event (for example, the parent entity for something) might get delayed by a couple of days. So in that case, I want my ERROR PROCESSOR to process the same messages multiple times.
I've run into something similar before. So, diving straight into your questions:
Not necessarily, you could perhaps send those messages back to Kafka in a new topic (let's say - error-topic). So, when your error processor is ready, it could just listen in to the this error-topic and consume those messages as they come in.
I think this question has been addressed in response to the first one. So, instead of using a file to write to and read from and open multiple file handles to do this concurrently, Kafka might be a better choice as it is designed for such problems.
Note: The following point is just some food for thought based on my limited understanding of your problem domain. So, you may just choose to ignore this safely.
One more point worth considering on your design for the service component - You might as well consider merging points 4 and 5 by sending all the error messages back to Kafka. That will enable you to process all error messages in a consistent way as opposed to putting some messages in the error DB and some in Kafka.
EDIT: Based on the additional information on the ERROR PROCESSOR requirement, here's a diagrammatic representation of the solution design.
I've deliberately kept the output of the ERROR PROCESSOR abstract for now just to keep it generic.
I hope this helps!
If you don't commit the consumed message before writing to the database, then nothing would be lost while Kafka retains the message. The tradeoff of that would be that if the consumer did commit to the database, but a Kafka offset commit fails or times out, you'd end up consuming records again and potentially have duplicates being processed in your service.
Even if you did write to a file, you wouldn't be guaranteed ordering unless you opened a file per partition, and ensured all consumers only ran on a single machine (because you're preserving state there, which isn't fault-tolerant). Deduplication would still need handled as well.
Also, rather than write your own consumer to a database, you could look into Kafka Connect framework. For validating a message, you can similarly deploy a Kafka Streams application to filter out bad messages from an input topic out into a topic to send to the DB

How to reject messages greater than 100 MB HornetQ

We would like to reject the messages being placed in hornetq greater than 50 MB.
Could we restrict it in the configuration at queue/connection factory level.
Placing large messages in HornetQ is causing heap issue and the server is getting crashed.
Any help appreciated.
Edit your .xml configuration like:
<address-setting match="your/queue/address">
<max-size-bytes>104857600</max-size-bytes>
<page-size-bytes>104857600</page-size-bytes>
<address-full-policy>DROP</address-full-policy>
</address-setting>
From the docs:
Messages are stored per address on the file system.
Instead of paging messages when the max size is reached, an address can also be configured to just drop messages when the address is full.
..to do so, set address-full-policy to DROP (messages will be silently dropped).
The above settings are documented at:
https://docs.jboss.org/hornetq/2.2.5.Final/user-manual/en/html/queue-attributes.html
While, especially concerning the message size elements: https://docs.jboss.org/hornetq/2.2.5.Final/user-manual/en/html/paging.html

With MSMQ how do I avoid Insufficient Resources when importing a large number of messages into a queue?

What?
I have a private MSMQ transactional queue that I need to export all (600k) messages, purge then import the messages back into the queue. When importing these messages I'm currently using a single transaction and getting an insufficient resources error. I can switch to use multiple transactions but I need a way to work out how many messages I can process in a single transaction, any ideas?
Why ?
If we don't periodically perform this operation the .mq files become bloated and fragmented. If there is another way to fix this problem let me know.
We had the same problem with MQ files when we got 7500 MQ-files with the total size about 30 gigabytes.
The solution is very easy.
You should purge Transaction dead-letter messages on this machine and after that you should restart MSMQ service.
When this service starts it runs defragmentation procedure. It compacts used space and removes unused MQ files.