How can we have JBOSS MDB retry its connection if it fails at startup? - jboss

We have a server app that is deployed across to server machines, each running JBOSS 4.2.2. We use JBOSS messaging with MDBs to communicate between the systems. Currently we need to start the servers in a very specific order so that JBOSS can connect properly. If a server starts and doesn't see its resources it never tries again. This is problematic and time consuming in testing when we're bouncing servers constantly. We believe that if we could specify a retry flag in JBOSS could reattempt to get the connection.
Is there a flag/config option in JBOSS that would reattempt to obtain JMS connections on failure at startup?
I am quite new to the JMS technology, so it is entirely possible that I have mixed up some terms here. Since this capability is to be used in house experimental or deprecated options are acceptable.
Edit: The problem is that a consumer starts up with no producer available and subsequently fails, never to try again. If a consumer and producer are up and the producer dies the consumer will retry for the producer to come back.

I'm 95% sure that JBoss MDBs do retry connections like that. If your MDBs are not receiving messages as you expect, I think something else is wrong. Do the MDBs depend on any other resources. Perhaps posting your EJB descriptors (META-IF/ejb-jar.xml and META-IF/jboss.xml) would help.

Related

How to add health check for topics in KafkaStreams api

I have a critical Kafka application that needs to be up and running all the time. The source topics are created by debezium kafka connect for mysql binlog. Unfortunately, many things can go wrong with this setup. A lot of times debezium connectors fail and need to be restarted, so does my apps then (because without throwing any exception it just hangs up and stops consuming). My manual way of testing and discovering the failure is checking kibana log, then consume the suspicious topic through terminal. I can mimic this in code but obviously no way the best practice. I wonder if there is the ability in KafkaStream api that allows me to do such health check, and check other parts of kafka cluster?
Another point that bothers me is if I can keep the stream alive and rejoin the topics when connectors are up again.
You can check the Kafka Streams State to see if it is rebalancing/running, which would indicate healthy operations. Although, if no data is getting into the Topology, I would assume there would be no errors happening, so you need to then lookup the health of your upstream dependencies.
Overall, sounds like you might want to invest some time into using monitoring tools like Consul or Sensu which can run local service health checks and send out alerts when services go down. Or at the very least Elasticseach alerting
As far as Kafka health checking goes, you can do that in several ways
Is the broker and zookeeper process running? (SSH to the node, check processes)
Is the broker and zookeeper ports open? (use Socket connection)
Are there important JMX metrics you can track? (Metricbeat)
Can you find an active Controller broker (use AdminClient#describeCluster)
Are there a required minimum number of brokers you would like to respond as part of the Controller metadata (which can be obtained from AdminClient)
Are the topics that you use having the proper configuration? (retention, min-isr, replication-factor, partition count, etc)? (again, use AdminClient)

Messages delivered twice at same instant to AMQP message consumer (ActiveMQ Artemis)

I recently had to rollback from WF13 to WF11 due to a regression in one of the dependencies.
Now I am trying to get the AMQP protocol to work on WildFly 11's messaging system. I am running a high availability setup with two nodes. Each of the node has a message consumer locally. This message consumer connects through AMQP1. I've added io.netty as a dependency to the org/apache/activemq/artemis/protocol/amqp module and updated org/apache/qpid to get the AMQP protocol to work (see also WFLY-7823). Now my AMQP message consumer works fine, but it seems to receive messages always exactly twice, and it appears to be even in the same frame. This happens on the same node (the other node receives messages through the bridge if the message isn't handled locally in the first place). So on one node and one queue consumer, I receive every message exactly twice, at the very same instant, before I even got to send an ACK/NACK for the first message I received.
I don't remember seeing this issue on WildFly 13.
Are there any known regressions regarding how messages are sent through the remote connectors? Perhaps an issue in the AMQP protocol? Or could it be a compatibility issue with the updated version of qtip?
For what it's worth Wildfly only uses ActiveMQ Artemis to fulfill their need for a JMS implementation (i.e. a traditional part of Java EE). I don't believe Wildfly has any real interest in other protocols or APIs aside from JMS.
My understanding here is that if you need to support multi-protocol message use-cases you should likely be using a standalone broker. I believe this is why none of the other protocols which Artemis supports (e.g. AMQP, STOMP, MQTT, OpenWire) are enabled in Wildfly or have documentation on how to enable them (aside from the occasional forum post).
It's also worth noting that Wildfly 11 has Artemis 1.5.5 in it which is 10 releases or so behind the latest version (i.e. 2.6.2). Much work has been done on the AMQP implementation in those releases so I think you're best served by using a standalone version of Artemis rather than Artemis embedded in Wildfly.

Weblogic 12c - Suddenly JMS Server stopped processing messages

Team,
We are facing a strange issue in our webservice application.
It has 6 weblogic managed instances (4 # m01,m02,m04,m05 - handles webservice requests which post the messages to JMS queues, 2 # m03,m06 - JMS instances which have MDB components which actually process the messages from queue).
We have observed one of the JMS instance (M06) is stopped processing messages all of sudden without any errors in the application or server logs. We observed the connection factory is not responding. This also causing hogging threads in service instances while posting the and searching the messages from the JMS queues. We are not able to see any issue from the thread dumps as well.
Adding to this when we try to stop the M06 instance it is not going down, eventually we had to kill the instance process and start the instance to resolve the issue. Then it is working fine for few days then again issue resurfacing.
We are using weblogic 12c.
Any one had faced this kind of issue earlier. Or any one have any idea what could have went wrong. Your inputs are greatly appreciated.
If I'll be you, I'll start by creating error queue, to get rid of any "poisoned" messages. More information can be found here: http://middlewaremagic.com/weblogic/?p=4670. Then try to check error queue and message content there.
Secondly, try to turn off mentioned instance (M06) at all, if bottleneck/errors does not appear on some other node, check M06 instance configuration and compare it with other nodes -> issue will be definitely somewhere there.

OpenShift message queue

I'd like to host apps that uses queue to communicate with each other on OpenShift.
One kind of apps - producers will put some data to the queue and another type - consumer will process the message. My question is how to implement message queue. I've thought about two approaches:
Create an app with JBoss, HornetQ and consumer and create proxy port for HornetQ, so that producers can send messages there.
Create an app with JBoss and consumer, and make a JBoss's HornetQ available to producers. It sounds a bit better for me, but I don't know if I can make queue available to producers and how it works if there are more instances of consumer on different nodes (and different JBoss instances).
I'm not sure how else to answer you besides showing you a link on how to use Wildfly. You can just use the Wildfly Cartridge:
https://www.openshift.com/quickstarts/wildfly-8
If you provide me some extra context I can try to enrich the answer a bit better. I need to know what is your problem, and what's not working.
If you just want to know how to configure Wildfly with HornetQ, the Wildfly cartridge I posted is the way to go.

JBoss Messaging Queue Stuck, with Remote Interface and MDB Consumer

I am trying to diagnose and fix what is likely an environmental problem. We have dev, SI, and production servers, and they have been setup the same for several years. One of the environments has stopped working for a particular JBM Queue, and I have so far been unable to figure out why.
What I am seeing via the JMX Console is that the messages are "stuck" in the delivering state. The MessageCount and DeliveringCount increment each time a message is sent through the Queue. The Consumer's onMessage() is invoked, and it outputs debug messages into the log4j log, however I don't think it ever completes the request.
This is a persisted JBM setup. Restarting the JBoss Server doesn't help. Clearing out or even dropping the JBM_* tables does not help.
The jbm_msg_ref entries have null transaction_id's and the state is 'C', which seems like it was put into this state by the prepared statement "ROLLBACK_MESSAGE_REF2" from the oracle-persistence-service.xml we use.
The MaxPoolSize for the MDB consumer is 15, and this is also the max amount of messages that are received by the consumer instances. After 15, it seems that the Queue "fills up" and there are no longer any available consumer MBeans to receive messages.
I am looking for ideas or suggestions about how to diagnose and fix the problem. I've been googling and trying stuff for a few days with little results. There are plenty of JIRA tickets for this fairly old version of JBM, but other instances of the same setup work fine, so I suspect that there is some sort of network, race condition, or env issue on this one server/DB combo.
JBoss Remoting 4.3.0.GA
JBoss Messaging 1.4.0.SP3
JBoss 4.3.0.GA
Thanks!
The issue was identified to be caused by Oracle database issues. The database instance was bounced to resolve the issue. Most likely, the database performance was slow enough to have caused a timing issue with message acknowledgement.