Database connection issue - App unable to recover - postgresql

I have a Java EE Web App running on JBoss AS 7.2 connecting to a Postgresql 9.4 database (hosted on RDS).
The App is quite large and does a mixture of web page serving, API calls and Scheduled Tasks
More and more frequently I am having to reboot the application server as the whole app has ground to a halt, checking DB stats I can see the number of connections has gone through the roof along with database CPU
(big spike as app stops responding, soon as I restart Jboss it drops back)
The database logs show that the connection to the client has been lost:
LOG: could not send data to client: Broken pipe
FATAL: connection to client lost
The jboss logs start filling up as transactions time-out...
Caused by: javax.transaction.RollbackException: ARJUNA016063: The transaction is not active!
The only way to fix is to restart JBoss and the number of connections goes back to normal.
My DB datasource configuration looks like this..
<datasource jta="false" jndi-name="java:/appWebDatasource" pool-name="jdbc/appWebDatasource" enabled="true" use-java-context="true" use-ccm="false">
<connection-url>jdbc:postgresql://${web.db.url}/MyApp</connection-url>
<driver>postgresql</driver>
<security>
<user-name>jboss</user-name>
<password>******</password>
</security>
<validation>
<check-valid-connection-sql>select 1</check-valid-connection-sql>
<validate-on-match>false</validate-on-match>
<background-validation>true</background-validation>
</validation>
<statement>
<share-prepared-statements>false</share-prepared-statements>
</statement>
</datasource>
I have been checking the pg_stat_activity table as soon as the issue occurs and there are no idle in transaction connections, they are all either idle or active
So my question is, how to configure JBoss or Postgresql in a way to stop this increase in number of connections that crashes the app??

You can have a cap on the max number of connections by declaring the max pool size you want to allow with this paramater <max-pool-size>
You have to consider your application and choose an appropriate size to set in <max-pool-size>

As you need to use the validation checker mechanism also along with parameter already mentioned by DaveB in data source configuration, given in the doc.

Related

max-pool-size for DB connections Keycloak version 11

Trying to investigate an issue regarding a keycloak deployment.
From the documentation we're seeing that we should alter this property in the stanealone.xml to allow more
However , after altering the standalone xml to this
</datasource>
<datasource jndi-name="java:jboss/datasources/KeycloakDS" pool-name="KeycloakDS" enabled="true" use-java-context="true" use-ccm="true">
<connection-url>jdbc:postgresql://${env.DB_ADDR:postgres}/${env.DB_DATABASE:keycloak}${env.JDBC_PARAMS:}</connection-url>
<driver>postgresql</driver>
<pool>
<max-pool-size>200</max-pool-size>
</pool>
When we allow the JMX connection to the management console I notice this
max pool size showing 19
Is there anything which could be overriding the max-pool-size setting we're using or how would one go about debugging where is derives the max-pool-size if not from the standalone.xml
I don't know why exactly you get this issue but I'll try to help to the best of my abilities.
Keycloak 11 uses Widlfy 20, which offers several options to configure the datasource.
Please try to set
pool-use-strict-min to true
pool-prefill to true
min-pool-size to 200
initial-pool-size to 200
Also, monitor the number of opened connection on your Postgres database, and check if they recoup with the JMX report you got.
Finally, if none of these cause a change in your situation I can only suggest to try another ManagedConnectionPool implementation (see mcp attribute)

Restcomm - Solving SMSC GW 7.2 configuration failures

We configured the latest version (7.2) SMSC-GW to work on on our server with the environment (cassandra and such). However, after setting up everything. Some failures are appearing (which did not appear in previous versions).
Firstly, when connecting the simulators and the gateway using the default settings (JSS7 <-> SMSCGW <-> SMPP)
JSS7 is connected and sending, but no response is received.
SMPP is connected to SMSC-GW and the EMSE is bound. SMPP tries to send to SS7 but receives a response PDU packet failure from the SMSC-GW
I tried configuring DB routing rules, but that did not work.
Also, the log in the SMSC-GW server is frequently displaying the following message:
16:00:28,504 INFO [SchedulerResourceAdaptor] (pool-56-thread-1) Not all SBB are running now: ServicesDownList=[smscTxSmppServerServiceState, smscRxSmppServerServiceState, smscTxSipServerServiceState, smscRxSipServerServiceState, smscTxHttpServerServiceState, moServiceState, homeRoutingServiceState, mtServiceState, alertServiceState, chargingServiceState, ]
And the JSS7 management console GUI is displaying this (which looks wrong):
So are these the source of the SMSC-GW failures?
UPDATE: I found this error in the server.log
2017-02-02 10:57:42,005 WARN [org.mobicents.slee.container.deployment.jboss.SleeContainerDeployerImpl] (SLEE-InternalDeployer-thread-1) SLEE DUs not deployed, due to missing dependencies: file:/home/coreteam/kitchensink/restcomm-smsc-7.2.109/jboss-5.1.0.GA/server/simulator/deploy/smsc-services-du-7.2.109.jar/
Followed by:
EventTypeID[name=org.mobicents.smsc.slee.services.smpp.server.events.SS7_SEND_MT,vendor=org.mobicents,version=1.0]
ResourceAdaptorTypeID[name=PersistenceResourceAdaptorType,vendor=org.mobicents,version=1.0]
ResourceAdaptorTypeID[name=SchedulerResourceAdaptorType,vendor=org.mobicents,version=1.0]
SipRA
EventTypeID[name=org.mobicents.smsc.slee.services.smpp.server.events.SS7_SEND_RSDS,vendor=org.mobicents,version=1.0]
SchedulerResourceAdaptor^M
PersistenceResourceAdaptor^M
EventTypeID[name=org.mobicents.smsc.slee.services.smpp.server.events.SMPP_SM,vendor=org.mobicents,version=1.0]
EventTypeID[name=org.mobicents.smsc.slee.services.smpp.server.events.SS7_SM,vendor=org.mobicents,version=1.0]
EventTypeID[name=org.mobicents.smsc.slee.services.smpp.server.events.SIP_SM,vendor=org.mobicents,version=1.0]
2017-02-02 14:41:17,450 WARN [org.mobicents.slee.container.deployment.jboss.DeploymentManager] (main) Unable to INSTALL smsc-services-du-7.3.0-SNAPSHOT.jar right now. Waiting for dependencies to be resolved.
Solved it quite a while ago, but thought I would share. I just simply installed the SipRA missing dependency by adding the following in the deploy-config.xml file:
<ra-entity
resource-adaptor-id="ResourceAdaptorID[name=JainSipResourceAdaptor,vendor=net.java.slee.sip,version=1.2]"
entity-name="SipRA">
<properties>
<property name="javax.sip.PORT" type="java.lang.Integer" value="5060" />
</properties>
<ra-link name="SipRA" />
In the $JBOSS_HOME/server/profile_name/deploy/restcomm-slee directory.
I set the port to some other value since that number was already taken by some other service.
The smsc-services-du-7.2.109.jar then installed automatically the next time I ran the SMSC-GW.

Jboss Wildfly not closing connections when not in use

I am using jboss server with the following configurations timeout:-
<timeout>
<idle-timeout-minutes>1</idle-timeout-minutes>
</timeout>
<min-pool-size>10</min-pool-size>
<max-pool-size>30</max-pool-size>
<prefill>true</prefill>
<use-strict-min>false</use-strict-min>
<flush-strategy>FailingConnectionOnly</flush-strategy>
Now, as soon as the server reaches the max load(30 connections) the datasource details obtained from jboss's CLI reports Active Count = 30 and Available Count = 30.
However, even after reducing the server request to 1, the active count and the Available Count report 30 as their values.
Expected :- The numbers should decrease and ideally only 1 connection from prefilled pool should be used instead of keeping all the connections awake!!
I am seeing the following logs :-
17:34:12,359 DEBUG [org.jboss.jca.core.connectionmanager.pool.idle.IdleRemover] (IdleRemover) Notifying pools, interval: 30000
Please help!
The connection pool implementation (ironjacamar) on WildFly 8 is in FIFO aka round robin manner. So having max-pool-size number of request within time of idle-timeout-minutes is enough to keep the pool from shrinking.
I have to use another decrementer policy to tell the connection pool to shrink for a size n explicitly for every idle-timeout-minutes interval.
Sample setting as below
<pool>
<min-pool-size>5</min-pool-size>
<max-pool-size>20</max-pool-size>
<prefill>false</prefill>
<use-strict-min>true</use-strict-min>
<capacity>
<decrementer class-name="org.jboss.jca.core.connectionmanager.pool.capacity.SizeDecrementer">
<config-property name="Size">
1
</config-property>
</decrementer>
</capacity>
</pool>
http://www.ironjacamar.org/doc/userguide/1.1/en-US/html/ch05.html#deploying_capacity_policies

Wildfly 9 - mod_cluster on TCP

We are currently testing to move from Wildfly 8.2.0 to Wildfly 9.0.0.CR1 (or CR2 built from snapshot). The system is a cluster using mod_cluster and is running on VPS what in fact prevents it from using multicast.
On 8.2.0 we have been using the following configuration of the modcluster that works well:
<mod-cluster-config proxy-list="1.2.3.4:10001,1.2.3.5:10001" advertise="false" connector="ajp">
<dynamic-load-provider>
<load-metric type="cpu"/>
</dynamic-load-provider>
</mod-cluster-config>
Unfortunately, on 9.0.0 proxy-list was deprecated and the start of the server will finish with an error. There is a terrible lack of documentation, however after a couple of tries I have discovered that proxy-list was replaced with proxies that are a list of outbound-socket-bindings. Hence, the configuration looks like the following:
<mod-cluster-config proxies="mc-prox1 mc-prox2" advertise="false" connector="ajp">
<dynamic-load-provider>
<load-metric type="cpu"/>
</dynamic-load-provider>
</mod-cluster-config>
And the following should be added into the appropriate socket-binding-group (full-ha in my case):
<outbound-socket-binding name="mc-prox1">
<remote-destination host="1.2.3.4" port="10001"/>
</outbound-socket-binding>
<outbound-socket-binding name="mc-prox2">
<remote-destination host="1.2.3.5" port="10001"/>
</outbound-socket-binding>
So far so good. After this, the httpd cluster starts registering the nodes. However I am getting errors from load balancer. When I look into /mod_cluster-manager, I see a couple of Node REMOVED lines and there are also many many errors like:
ERROR [org.jboss.modcluster] (UndertowEventHandlerAdapter - 1) MODCLUSTER000042: Error MEM sending STATUS command to node1/1.2.3.4:10001, configuration will be reset: MEM: Can't read node
In the log of mod_cluster there are the equivalent warnings:
manager_handler STATUS error: MEM: Can't read node
As far as I understand, the problem is that although wildfly/modcluster is able to connect to httpd/mod_cluster, it does not work the other way. Unfortunately, even after an extensive effort I am stuck.
Could someone help with setting mod_cluster for Wildfly 9.0.0 without advertising? Thanks a lot.
I ran into the Node Removed issue to.
I managed to solve it by using the following as instance-id
<subsystem xmlns="urn:jboss:domain:undertow:2.0" instance-id="${jboss.server.name}">
I hope this will help someone else to ;)
There is no need for any unnecessary effort or uneasiness about static proxy configuration. Each WildFly distribution comes with xsd sheets that describe xml subsystem configuration. For instance, with WildFly 9x, it's:
WILDFLY_DIRECTORY/docs/schema/jboss-as-mod-cluster_2_0.xsd
It says:
<xs:attribute name="proxies" use="optional">
<xs:annotation>
<xs:documentation>List of proxies for mod_cluster to register with defined by outbound-socket-binding in socket-binding-group.</xs:documentation>
</xs:annotation>
<xs:simpleType>
<xs:list itemType="xs:string"/>
</xs:simpleType>
</xs:attribute>
The following setup works out of box
Download wildfly-9.0.0.CR1.zip or build with ./build.sh from sources
Let's assume you have 2 boxes, Apache HTTP Server with mod_cluster acting as a load balancing proxy and your WildFly server acting as a worker. Make sure botch servers can access each other on both MCMP enabled VirtualHost's address and port (Apache HTTP Server side) and on WildFly AJP and HTTP connector side. The common mistake is to binf WildFLy to localhost; it then reports its addess as localhost to the Apache HTTP Server residing on a dofferent box, which makes it impossible for it to contact WildFly server back. The communication is bidirectional.
This is my configuration diff from the default wildfly-9.0.0.CR1.zip.
328c328
< <mod-cluster-config advertise-socket="modcluster" connector="ajp" advertise="false" proxies="my-proxy-one">
---
> <mod-cluster-config advertise-socket="modcluster" connector="ajp">
384c384
< <subsystem xmlns="urn:jboss:domain:undertow:2.0" instance-id="worker-1">
---
> <subsystem xmlns="urn:jboss:domain:undertow:2.0">
435c435
< <socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:102}">
---
> <socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:0}">
452,454d451
< <outbound-socket-binding name="my-proxy-one">
< <remote-destination host="10.10.2.4" port="6666"/>
< </outbound-socket-binding>
456c453
< </server>
---
> </server>
Changes explanation
proxies="my-proxy-one", outbound socket binding name; could be more of them here.
instance-id="worker-1", the name of the worker, a.k.a. JVMRoute.
offset -- you could ignore, it's just for my test setup. Offset does not apply to outbound socket bindings.
<outbound-socket-binding name="my-proxy-one"> - IP and port of the VirtualHost in Apache HTTP Server containing EnableMCPMReceive directive.
Conclusion
Generally, these MEM read / node error messages are related to network problems, e.g. WildFly can contact Apache, but Apache cannot contact WildFly back. Last but not least, it could happen that the Apache HTTP Server's configuration uses PersistSlots directive and some substantial enviroment conf change took place, e.g. switch from mpm_prefork to mpm_worker. In this case, MEM Read error messages are not realted to WildFly, but to the cached slotmem files in HTTPD/cache/mod_custer that need to be deleted.
I'm certain it's network in your case though.
After a couple of weeks I got back to the problem and found the solution. The problem was - of course - in configuration and had nothing in common with the particular version of Wildfly. Mode specifically:
There were three nodes in the domain and three servers in each node. All nodes were launched with the following property:
-Djboss.node.name=nodeX
...where nodeX is the name of a particular node. However, it meant that all three servers in the node get the same name, which is exactly what confused the load balancer.
As soon as I have removed this property, everything started to work.

jboss-eap-5.1 heap out of memory

My system run out of memory sometimes. I can see bellow error in logs every time when system is running out of heap memory
Maximum number of threads (200) created for connector with address abc.com/192.168.1.45 and port 8080
Any ideas why this is happening?
JBoss is crashing due to a high number of threads created. When it tries to create a new one, the application stops responding and starts to shutdown the application server.
Increasing the maxThreads parameter will resolve the issue. Do this incrementally; raising the value of maxThreads too much can result in performance problems such as:
High memory usage
General slowness due to the JVM being forced to context switch
between many threads frequently
To increase maxThreads edit in JBOSS_EAP_DIST/jboss-as/server/PROFILE/deploy/jbossweb.sar/server.xml
<!-- A HTTP/1.1 Connector on port 8080 -->
<Connector protocol="HTTP/1.1" port="8080" address="${jboss.bind.address}"
connectionTimeout="20000" redirectPort="8443" maxThreads="3000"
minSpareThreads="2000" maxKeepAliveRequests="-1" />
see also: Performance Tuning Guide - Chapter 2. Connectors