I am trying to diagnose and fix what is likely an environmental problem. We have dev, SI, and production servers, and they have been setup the same for several years. One of the environments has stopped working for a particular JBM Queue, and I have so far been unable to figure out why.
What I am seeing via the JMX Console is that the messages are "stuck" in the delivering state. The MessageCount and DeliveringCount increment each time a message is sent through the Queue. The Consumer's onMessage() is invoked, and it outputs debug messages into the log4j log, however I don't think it ever completes the request.
This is a persisted JBM setup. Restarting the JBoss Server doesn't help. Clearing out or even dropping the JBM_* tables does not help.
The jbm_msg_ref entries have null transaction_id's and the state is 'C', which seems like it was put into this state by the prepared statement "ROLLBACK_MESSAGE_REF2" from the oracle-persistence-service.xml we use.
The MaxPoolSize for the MDB consumer is 15, and this is also the max amount of messages that are received by the consumer instances. After 15, it seems that the Queue "fills up" and there are no longer any available consumer MBeans to receive messages.
I am looking for ideas or suggestions about how to diagnose and fix the problem. I've been googling and trying stuff for a few days with little results. There are plenty of JIRA tickets for this fairly old version of JBM, but other instances of the same setup work fine, so I suspect that there is some sort of network, race condition, or env issue on this one server/DB combo.
JBoss Remoting 4.3.0.GA
JBoss Messaging 1.4.0.SP3
JBoss 4.3.0.GA
Thanks!
The issue was identified to be caused by Oracle database issues. The database instance was bounced to resolve the issue. Most likely, the database performance was slow enough to have caused a timing issue with message acknowledgement.
Related
We have containerized ActiveMQ Artemis 2.16.0 and deployed it as a K8s deployment for KEDA.
We use STOMP using stomp.py python module. The ACK-mode is set as client-individual and consumerWindowSize = 0 on the connection. We are promptly acknowledging the message as soon as we read it.
The problem is, sometimes, the message count in the web console does not become zero even after all the messages are actually consumed and acknowledged. When I browse the queue, I don't see any messages in it. This is causing KEDA to spin up pods unnecessarily. Please refer to the attached screenshots I attached in the JIRA for this issue.
I fixed the issue in my application code. My requirement was one queue listener should consume only one message and exit gracefully. So, soon after sending ACK for the consumed message, I disconnected the connection, instead of waiting for the sleep duration to disconnect.
Thanks, Justin, for spending time on this.
If I have three Brokers running in Kafka cluster, and one of them failed due to an error. So I only have two running brokers left.
1) Usually, when this happens, restarting a failed broker will solve the problem?
2) If restarting the broker wouldn't solve the problem, can I erase all the data that the failed Broker had and restart it? (Because all the data will be restored by two other Brokers). Is this method okay in production? If not, why?
When I was testing Kafka with my desktop on Windows 10 long time ago, if a Broker has an error and the restarting the server wouldn't work, I erased all the data. Then, it began to run okay. (I am aware of Kafka and Windows issues.) So, I am curious if this would work on multi-clustered Kafka (Linux) environments.
Ultimately, it depends what the error is. If it is a networking error, then there is nothing necessarily wrong with the logs, so you should leave them alone (unless they are not being replicated properly).
The main downside of deleting all data from a broker is that some topics may only have one replica, and it is on that node. Or if you lose other brokers while the replication is catching up, then all data is potentially gone. Also, if you have many TB of data that is replicating back to one node, then you have to be aware of any disk/network contention that may occur, and consider throttling the replication (which would take hours for the node to be healthy again)
But yes, Windows and Linux ultimately work the same in this regard, and it is one way to address a clustered environment
I am using Atomikos Transaction Manager to manage distributed Transactions in a Spring-Boot standalone app to integrate ActiveMQ queues and a postgresql DB (JPA via Hibernate5), using Apache Camel.
My issue is that there are lots of messages like the below printed in my logs.
Purging orphaned entry from log: CoordinatorLogEntry [id=myapp148991647253713828, wasCommitted=true, state=COMMITTING]
Why are these logs printed all the time?
I believe that the timeouts (ActiveMQ Component, Datasource or Atomikos) are not well configured, but I don't know how to start looking into it.
Any ideas?
It was really simple really.
When messing with the configuration I mistakenly enabled the com.atomikos.icatch.forget_orphaned_log_entries_delay and changed the default value of 86400000 (1 day) to 30000 (30 sec). For now, I just rolled back to default value.
Team,
We are facing a strange issue in our webservice application.
It has 6 weblogic managed instances (4 # m01,m02,m04,m05 - handles webservice requests which post the messages to JMS queues, 2 # m03,m06 - JMS instances which have MDB components which actually process the messages from queue).
We have observed one of the JMS instance (M06) is stopped processing messages all of sudden without any errors in the application or server logs. We observed the connection factory is not responding. This also causing hogging threads in service instances while posting the and searching the messages from the JMS queues. We are not able to see any issue from the thread dumps as well.
Adding to this when we try to stop the M06 instance it is not going down, eventually we had to kill the instance process and start the instance to resolve the issue. Then it is working fine for few days then again issue resurfacing.
We are using weblogic 12c.
Any one had faced this kind of issue earlier. Or any one have any idea what could have went wrong. Your inputs are greatly appreciated.
If I'll be you, I'll start by creating error queue, to get rid of any "poisoned" messages. More information can be found here: http://middlewaremagic.com/weblogic/?p=4670. Then try to check error queue and message content there.
Secondly, try to turn off mentioned instance (M06) at all, if bottleneck/errors does not appear on some other node, check M06 instance configuration and compare it with other nodes -> issue will be definitely somewhere there.
We have a server app that is deployed across to server machines, each running JBOSS 4.2.2. We use JBOSS messaging with MDBs to communicate between the systems. Currently we need to start the servers in a very specific order so that JBOSS can connect properly. If a server starts and doesn't see its resources it never tries again. This is problematic and time consuming in testing when we're bouncing servers constantly. We believe that if we could specify a retry flag in JBOSS could reattempt to get the connection.
Is there a flag/config option in JBOSS that would reattempt to obtain JMS connections on failure at startup?
I am quite new to the JMS technology, so it is entirely possible that I have mixed up some terms here. Since this capability is to be used in house experimental or deprecated options are acceptable.
Edit: The problem is that a consumer starts up with no producer available and subsequently fails, never to try again. If a consumer and producer are up and the producer dies the consumer will retry for the producer to come back.
I'm 95% sure that JBoss MDBs do retry connections like that. If your MDBs are not receiving messages as you expect, I think something else is wrong. Do the MDBs depend on any other resources. Perhaps posting your EJB descriptors (META-IF/ejb-jar.xml and META-IF/jboss.xml) would help.