I am using ArtemisMQ 2.17. When I look at each of the Queues, the message count for each always shows 0 as we have listeners actively scanning to pick up messages quickly. However, when I look at the overall message count on the broker, it starts to grow from 0 to upwards of millions; this starts to grow about 20 minutes into running the broker.
If I restart the broker service, my address memory is cleared. Sometimes, which I haven't determined the time interval just yet, when I restart the service, I'll see messages show up in the ExpiryQueue. But this doesn't happen everytime I restart the service; only sometimes.
Our application is using Spring JMS for the Producer and Listeners; The addresses/queues are multicast.
Here is what I am seeing on the broker attributes section in the console. I would expect message count to always be 0 but after 20min or so, it starts to climb indefinitely, and the acknowledged and added start to not match up at that point.
What can I do so that messages are picked up and completely removed? OR What's happening here? Where are these messages going and why do they randomly appear in my ExpiryQueue after a few restarts?
Output of: ./artemis queue stat
View in console for broker level attributes:
Related
I am currently running NiFi 1.9.2 in a clustered environment with 3 nodes. Recently what I have noticed is that the flow seems to get stuck. The queue shows that there are items in the queue, but nothing is going to the downstream processor. When I list the items in the queue, I get "The queue has no FlowFiles".
The queue in this case is set to load balance with round robin. If I stop the downstream processor, and change the configuration on the queue to not to load balance, and then switch it back to round robin again, the queue items distribute to the other two nodes, and I can see the flow files when I list the items in the queue. However, it only shows items as being in two of the nodes. When I restart the downstream processor, the 2/3 of the items get processed leaving the 1/3 that would be on the node whose queue items I cannot see. This behavior seems to persist even after restarting the cluster service.
If I change the queue to not to load balance, then everything seems to get put on a good node, and the queue get emptied. So it looks like there might be something not correct on my first node.
Any suggestions on what to try?
Thanks,
-tj
You should check disk usages. If usage rate of the disk nifi located is equal or higher than the "nifi.content.repository.archive.max.usage.percentage" setting in nifi.properties file, you may see this NiFi's strange behavior. If you have this kind of situation, you can try to delete old log files of NiFi
In my application when I send messages I use the Metadata in the callback to save the offset of the record for future usage. However sometimes the metadata.offset() returns -1 which makes things hard later.
Why does this happen and is there a way to get the offset without consuming the topic to find it.
Edit: I am on ack 0 currently, when I pass to ack 1 I don't have these errors anymore however my performance drops drastically. From 100k message in 10 sec to 1 min.
acks=0 If set to zero then the producer will not wait for any
acknowledgment from the server at all. The record will be immediately
added to the socket buffer and considered sent. No guarantee can be
made that the server has received the record in this case, and the
retries configuration will not take effect (as the client won't
generally know of any failures). The offset given back for each
record will always be set to -1.
This is not exactly true as out of 100k messages I got 95k with offsets but I guess it's normal.
Still will need to find another solution to get the offset with ack=0
[org.jgroups.protocols.pbcast.NAKACK] (requester=, local_addr=) message ::port not found in retransmission table of :port:
(size=xxxx, missing=x, highest stability=xxxxx)]
NAKACK (or its newer cousin, NAKACK2) provide reliable transmission of messages to the cluster. To do this, every messages gets a sequence number (seqno) and receivers deliver the message to the application in seqno order.
Every cluster member has a table of all other members and their messages (conceptually a list). When member P sends messages P21, P22 and P23, a receiver R first looks up the message list for R, then adds P21-P23 to the list.
However, in your case, the list for R was not found. This means that R was not a cluster member (anymore).
For example, if we have cluster {P,Q,R,T}, and member R leaves or is excluded because it was suspected (e.g. we didn't receive a heartbeat for a period of time), then messages P21-23 will be dropped by any receiver.
This is because JGroups only allows cluster members to send and receive messages.
How can a member get excluded?
This is likely done by on of the failure detection protocols (e.g. FD_ALL or FD).
Another possibility is that your thread pools were clogged and failure detection heartbeat messages were dropped, leading to false suspicions.
Also, long GC pauses can cause this.
Fixes:
Increase the timeouts in FD_ALL or FD. The timeout should be longer than the longest GC cycle. Note that it will now take longer to detect hung members.
Size your thread pools, e.g. make sure that the max number of threads are big and the queue is disabled.
Note that false suspicions can happen, but MERGE3 should rememdy a split cluster later on.
I'm working with an application that requires the use of hornet-q's.
It's kind of hit or miss for some reason. When I create a queue, the first message to that queue works, but a second does not, so I've tried using a new queue for each connection to the REST API that is running on JBOSS. Sometimes this is okay, sometimes I get 412 - precondition failed (when the same name is used more than once) or just simply 500 internal errors.
The application has a /api/hornet-queue/queues/ path, but it doesn't allow GET requests.
Is there another way to tell what queues are open?
you are leaking a consumer and the message is being held on the consumer..
Either reuse the same consumer, or close the consumer.
in case you require to close consumers like this, set consumer-window-size to 0, so you won't cache messages and waste resoruces.
JMS mesages are sometimes moving to the DLQ without throwing any exception.
Jboss server instance used is 4.3.0.GA_CP04_EAP.
We are using an an MDB that listens for incoming messages on a queue A, when it receives any message it updates the database and sens an email in one transaction.Transaction is CMT.
Now, what is happening is, sometimes mesages are not picked up by the consumer and they end up in the DLQ. Though from the JMX- console message count i could see that the message once did arrive to the queue A but then goes to the DLQ.
This happens intermittently and does not throw any exceptions on the logs either .
What seems to work most of the times is restarting the servers. No idea about what happens behind the scenes though.
**And after 29 days, same problem has returned.
This follows a pattern but varies with every restart.
There are 2 clustered serevrs which also do loadbalancing , P1 and P2.
First two email messages go to and processed by P1-Email sent
Next email message resquest goes to P2-Email sent
Next two email messages go to and processed by P1-Email sent
Next email message resquest goes to P2-Email NOT SENT
and the cycle repeats
I have found a workaround to this nagging problem thanks to the helpful info found at http://leakfromjavaheap.blogspot.in/2013/05/when-dead-letter-queue-becomes-zombie.html
DLQ listener is set up to listen for any incoming messages and puts them back to their intended destination if any of them is found on DLQ.
Also, considering the situation where any message is travelling from DLQ to the Queue and back to the DLQ in endless loops, a counter is set to check how many times the message has been to the DLQ before, if it exceeds the limit, then it is put to a Permanent DLQ (DLQ for a DLQ).
Application has been running smoothly ever since.
If you can provide the log details when message goes to DLQ, would be better to dig into this issue.
The logs did not contain any useful info; not even an exception to give a hint.
Finally,changed the local tx data source to xa data source and it was a success.Still wondering if there is a reason behind it.