AppFabric Cache service crashes when accessed from a different machine - appfabric-cache

Having persuaded it all to work in my development environment, and then on a test server, I'm trying to set up AppFabric Cache on a server that will be in production.
So I have the AppFabricCache Service installed on what I'll call "cacheserver1" and a client installed on what I'll call "webserver1". When I connect to the cache from a client installed on cacheserver1, it works. When I connect to the cache from a client installed on webserver1 - or from my development machine, having opened the firewall - the cache service crashes.
Application: DistributedCacheService.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.Runtime.CallbackException
Stack:
at System.Runtime.AsyncResult.Complete(Boolean)
at System.ServiceModel.Channels.ConnectionStream+IOAsyncResult.OnAsyncIOComplete(System.Object)
at System.ServiceModel.Channels.SocketConnection.OnSendAsync(System.Object, System.Net.Sockets.SocketAsyncEventArgs)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
at System.Net.Sockets.SocketAsyncEventArgs.FinishOperationSuccess(System.Net.Sockets.SocketError, Int32, System.Net.Sockets.SocketFlags)
at System.Net.Sockets.SocketAsyncEventArgs.CompletionPortCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
The "hosts" section of the ClusterConfig.xml reads:
<hosts>
<host replicationPort="22236" arbitrationPort="22235" clusterPort="22234"
hostId="990129399" size="849" leadHost="true" account="WIN-BLABLABLA\AppFabricCache"
cacheHostName="AppFabricCachingService" name="WIN-BLABLABLA"
cachePort="22233" />
</hosts>
The firewall is configured so that cacheserver1 accepts incoming connections on port 22233 from webserver1. They're both AWS machines, not on the same domain or anything... I had assumed that I could just talk over TCPIP.
How is a simple request crashing the whole service? This doesn't make me feel reassured about the robustness of the cache service.
Is this configuration how a Distributed Cache is meant to be used?
What do I need to do to make it work / get a more helpful error out of it?

Got it!
I needed to disable the Cache Security (which is fine, I'm just going to manage it with the firewalls).
I needed to add
to the config file on the client (webserver), and
<advancedProperties>
<securityProperties mode="None" protectionLevel="None">
<authorization>
<allow users="Everyone" />
</authorization>
</securityProperties>
</advancedProperties>
to ClusterConfig.xml on the cacheserver

Related

Infinispan (Red Hat Data Grid) in Openshift, with WebSphere Liberty

We're trying to use Red Hat Data Grid (RHDG)/Infinispan in our OCP (4.5.36) cluster. We have the latest official RHDG Operator installed and a Cache type cluster defined. (Which is apparently a k8s StatefulSet.)
I've then configured a WebSphere Liberty container/Deployment to try to use that Infinispan cluster for its sessions, as described in https://github.com/WASdev/ci.docker#session-caching.
Both the Infinispan cluster and the Liberty Deployment are in the same Project/namespace.
However, the Liberty container fails to connect, and the Infinispan containers are reporting several warnings of their own.
The Liberty container "client" log:
INFINISPAN_SERVICE_NAME(original): session-infinispan
INFINISPAN_SERVICE_NAME(normalized): SESSION_INFINISPAN
INFINISPAN_HOST: 172.30.137.86
INFINISPAN_PORT: 11222
INFINISPAN_USER: developer
INFINISPAN_PASS: <redacted>
Launching defaultServer (WebSphere Application Server 21.0.0.3/wlp-1.0.50.cl210320210309-1101) on Eclipse OpenJ9 VM, version 1.8.0_282-b08 (en_US)
[AUDIT ] CWWKE0001I: The server defaultServer has been launched.
[AUDIT ] CWWKE0100I: This product is licensed for development, and limited production use. The full license terms can be viewed here: https://public.dhe.ibm.com/ibmdl/export/pub/software/websphere/wasdev/license/base_ilan/ilan/21.0.0.3/lafiles/en.html
[AUDIT ] CWWKG0093A: Processing configuration drop-ins resource: /opt/ibm/wlp/usr/servers/defaultServer/configDropins/defaults/keystore.xml
[AUDIT ] CWWKG0093A: Processing configuration drop-ins resource: /opt/ibm/wlp/usr/servers/defaultServer/configDropins/overrides/infinispan-client-sessioncache.xml
[AUDIT ] CWWKZ0058I: Monitoring dropins for applications.
[AUDIT ] CWWKT0016I: Web application available (default_host): http://payment-engine-6dcc5b6d5-jclx2:9080/payment/
[ERROR ] ISPN004007: Exception encountered. Retry 10 out of 10
org.infinispan.client.hotrod.exceptions.TransportException:: ISPN004071: Connection to 172.30.137.86/172.30.137.86:11222 was closed while waiting for response.
[ERROR ] SESN0307E: An exception occurred when initializing the cache. The exception is: org.infinispan.client.hotrod.exceptions.TransportException:: org.infinispan.client.hotrod.exceptions.TransportException:: ISPN004071: Connection to 172.30.137.86/172.30.137.86:11222 was closed while waiting for response.
at org.infinispan.client.hotrod.impl.transport.netty.ActivationHandler.exceptionCaught(ActivationHandler.java:53)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:300)
...
What looks like the relevant part of the Inifinispan container log:
03:40:18,628 WARN (SINGLE_PORT-ServerIO-4-2) [io.netty.handler.ssl.ApplicationProtocolNegotiationHandler] [id: 0xc39380c8, L:/10.254.0.248:11222 ! R:/10.254.2.65:32986] TLS handshake failed: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: a0061e21000003ffffffff0f0000
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1254)
(Actually, there are several Infinispan startup WARNs, mostly about deprecated capabilities. But this is the only one with a stack trace, so I'm jumping to the conclusion that it might be "the culprit")
Also, this is the Infinispan Service, so you can see the IP and port match what the Liberty container is using:
Working through this on the Infinispan chat service, it does appear that there's incorrect or incomplete setup of SSL/TLS.
I had attempted to remove encryption in the Infinispan cluster, but I either didn't sufficiently restart components or you can't change it after the fact. Removing the cluster and recreating with it disabled, though, enabled the Liberty communication to work.
The following CR YAML works:
apiVersion: infinispan.org/v1
kind: Infinispan
metadata:
name: session-infinispan
spec:
replicas: 1
service:
type: Cache
security:
endpointEncryption:
type: None
Now to pursue what's missing from the Liberty setup to make use of SSL correctly. The Infinispan chat conversation says that this Liberty XML setup from the official image:
<server>
<featureManager>
<feature>sessionCache-1.0</feature>
</featureManager>
<httpSessionCache libraryRef="InfinispanLib">
<properties infinispan.client.hotrod.server_list="${INFINISPAN_HOST}:${INFINISPAN_PORT}"/>
<properties infinispan.client.hotrod.marshaller="org.infinispan.commons.marshall.JavaSerializationMarshaller"/>
<properties infinispan.client.hotrod.java_serial_whitelist=".*"/>
<properties infinispan.client.hotrod.auth_username="${INFINISPAN_USER}"/>
<properties infinispan.client.hotrod.auth_password="${INFINISPAN_PASS}"/>
<properties infinispan.client.hotrod.auth_realm="default"/>
<properties infinispan.client.hotrod.sasl_mechanism="DIGEST-MD5"/>
<properties infinispan.client.hotrod.auth_server_name="infinispan"/>
</httpSessionCache>
<httpSessionCache enableBetaSupportForInfinispan="true"/> <!-- TODO remove once no longer gated -->
<library id="InfinispanLib">
<fileset dir="${shared.resource.dir}/infinispan" includes="*.jar"/>
</library>
</server>
Needs the following properties added:
# Encryption
infinispan.client.hotrod.sni_host_name=$SERVICE_HOSTNAME
# Path to the TLS certificate.
# Clients automatically generate trust stores from certificates.
infinispan.client.hotrod.trust_store_path=/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt

Database connection issue - App unable to recover

I have a Java EE Web App running on JBoss AS 7.2 connecting to a Postgresql 9.4 database (hosted on RDS).
The App is quite large and does a mixture of web page serving, API calls and Scheduled Tasks
More and more frequently I am having to reboot the application server as the whole app has ground to a halt, checking DB stats I can see the number of connections has gone through the roof along with database CPU
(big spike as app stops responding, soon as I restart Jboss it drops back)
The database logs show that the connection to the client has been lost:
LOG: could not send data to client: Broken pipe
FATAL: connection to client lost
The jboss logs start filling up as transactions time-out...
Caused by: javax.transaction.RollbackException: ARJUNA016063: The transaction is not active!
The only way to fix is to restart JBoss and the number of connections goes back to normal.
My DB datasource configuration looks like this..
<datasource jta="false" jndi-name="java:/appWebDatasource" pool-name="jdbc/appWebDatasource" enabled="true" use-java-context="true" use-ccm="false">
<connection-url>jdbc:postgresql://${web.db.url}/MyApp</connection-url>
<driver>postgresql</driver>
<security>
<user-name>jboss</user-name>
<password>******</password>
</security>
<validation>
<check-valid-connection-sql>select 1</check-valid-connection-sql>
<validate-on-match>false</validate-on-match>
<background-validation>true</background-validation>
</validation>
<statement>
<share-prepared-statements>false</share-prepared-statements>
</statement>
</datasource>
I have been checking the pg_stat_activity table as soon as the issue occurs and there are no idle in transaction connections, they are all either idle or active
So my question is, how to configure JBoss or Postgresql in a way to stop this increase in number of connections that crashes the app??
You can have a cap on the max number of connections by declaring the max pool size you want to allow with this paramater <max-pool-size>
You have to consider your application and choose an appropriate size to set in <max-pool-size>
As you need to use the validation checker mechanism also along with parameter already mentioned by DaveB in data source configuration, given in the doc.

Restcomm - Solving SMSC GW 7.2 configuration failures

We configured the latest version (7.2) SMSC-GW to work on on our server with the environment (cassandra and such). However, after setting up everything. Some failures are appearing (which did not appear in previous versions).
Firstly, when connecting the simulators and the gateway using the default settings (JSS7 <-> SMSCGW <-> SMPP)
JSS7 is connected and sending, but no response is received.
SMPP is connected to SMSC-GW and the EMSE is bound. SMPP tries to send to SS7 but receives a response PDU packet failure from the SMSC-GW
I tried configuring DB routing rules, but that did not work.
Also, the log in the SMSC-GW server is frequently displaying the following message:
16:00:28,504 INFO [SchedulerResourceAdaptor] (pool-56-thread-1) Not all SBB are running now: ServicesDownList=[smscTxSmppServerServiceState, smscRxSmppServerServiceState, smscTxSipServerServiceState, smscRxSipServerServiceState, smscTxHttpServerServiceState, moServiceState, homeRoutingServiceState, mtServiceState, alertServiceState, chargingServiceState, ]
And the JSS7 management console GUI is displaying this (which looks wrong):
So are these the source of the SMSC-GW failures?
UPDATE: I found this error in the server.log
2017-02-02 10:57:42,005 WARN [org.mobicents.slee.container.deployment.jboss.SleeContainerDeployerImpl] (SLEE-InternalDeployer-thread-1) SLEE DUs not deployed, due to missing dependencies: file:/home/coreteam/kitchensink/restcomm-smsc-7.2.109/jboss-5.1.0.GA/server/simulator/deploy/smsc-services-du-7.2.109.jar/
Followed by:
EventTypeID[name=org.mobicents.smsc.slee.services.smpp.server.events.SS7_SEND_MT,vendor=org.mobicents,version=1.0]
ResourceAdaptorTypeID[name=PersistenceResourceAdaptorType,vendor=org.mobicents,version=1.0]
ResourceAdaptorTypeID[name=SchedulerResourceAdaptorType,vendor=org.mobicents,version=1.0]
SipRA
EventTypeID[name=org.mobicents.smsc.slee.services.smpp.server.events.SS7_SEND_RSDS,vendor=org.mobicents,version=1.0]
SchedulerResourceAdaptor^M
PersistenceResourceAdaptor^M
EventTypeID[name=org.mobicents.smsc.slee.services.smpp.server.events.SMPP_SM,vendor=org.mobicents,version=1.0]
EventTypeID[name=org.mobicents.smsc.slee.services.smpp.server.events.SS7_SM,vendor=org.mobicents,version=1.0]
EventTypeID[name=org.mobicents.smsc.slee.services.smpp.server.events.SIP_SM,vendor=org.mobicents,version=1.0]
2017-02-02 14:41:17,450 WARN [org.mobicents.slee.container.deployment.jboss.DeploymentManager] (main) Unable to INSTALL smsc-services-du-7.3.0-SNAPSHOT.jar right now. Waiting for dependencies to be resolved.
Solved it quite a while ago, but thought I would share. I just simply installed the SipRA missing dependency by adding the following in the deploy-config.xml file:
<ra-entity
resource-adaptor-id="ResourceAdaptorID[name=JainSipResourceAdaptor,vendor=net.java.slee.sip,version=1.2]"
entity-name="SipRA">
<properties>
<property name="javax.sip.PORT" type="java.lang.Integer" value="5060" />
</properties>
<ra-link name="SipRA" />
In the $JBOSS_HOME/server/profile_name/deploy/restcomm-slee directory.
I set the port to some other value since that number was already taken by some other service.
The smsc-services-du-7.2.109.jar then installed automatically the next time I ran the SMSC-GW.

Wildfly 9 - mod_cluster on TCP

We are currently testing to move from Wildfly 8.2.0 to Wildfly 9.0.0.CR1 (or CR2 built from snapshot). The system is a cluster using mod_cluster and is running on VPS what in fact prevents it from using multicast.
On 8.2.0 we have been using the following configuration of the modcluster that works well:
<mod-cluster-config proxy-list="1.2.3.4:10001,1.2.3.5:10001" advertise="false" connector="ajp">
<dynamic-load-provider>
<load-metric type="cpu"/>
</dynamic-load-provider>
</mod-cluster-config>
Unfortunately, on 9.0.0 proxy-list was deprecated and the start of the server will finish with an error. There is a terrible lack of documentation, however after a couple of tries I have discovered that proxy-list was replaced with proxies that are a list of outbound-socket-bindings. Hence, the configuration looks like the following:
<mod-cluster-config proxies="mc-prox1 mc-prox2" advertise="false" connector="ajp">
<dynamic-load-provider>
<load-metric type="cpu"/>
</dynamic-load-provider>
</mod-cluster-config>
And the following should be added into the appropriate socket-binding-group (full-ha in my case):
<outbound-socket-binding name="mc-prox1">
<remote-destination host="1.2.3.4" port="10001"/>
</outbound-socket-binding>
<outbound-socket-binding name="mc-prox2">
<remote-destination host="1.2.3.5" port="10001"/>
</outbound-socket-binding>
So far so good. After this, the httpd cluster starts registering the nodes. However I am getting errors from load balancer. When I look into /mod_cluster-manager, I see a couple of Node REMOVED lines and there are also many many errors like:
ERROR [org.jboss.modcluster] (UndertowEventHandlerAdapter - 1) MODCLUSTER000042: Error MEM sending STATUS command to node1/1.2.3.4:10001, configuration will be reset: MEM: Can't read node
In the log of mod_cluster there are the equivalent warnings:
manager_handler STATUS error: MEM: Can't read node
As far as I understand, the problem is that although wildfly/modcluster is able to connect to httpd/mod_cluster, it does not work the other way. Unfortunately, even after an extensive effort I am stuck.
Could someone help with setting mod_cluster for Wildfly 9.0.0 without advertising? Thanks a lot.
I ran into the Node Removed issue to.
I managed to solve it by using the following as instance-id
<subsystem xmlns="urn:jboss:domain:undertow:2.0" instance-id="${jboss.server.name}">
I hope this will help someone else to ;)
There is no need for any unnecessary effort or uneasiness about static proxy configuration. Each WildFly distribution comes with xsd sheets that describe xml subsystem configuration. For instance, with WildFly 9x, it's:
WILDFLY_DIRECTORY/docs/schema/jboss-as-mod-cluster_2_0.xsd
It says:
<xs:attribute name="proxies" use="optional">
<xs:annotation>
<xs:documentation>List of proxies for mod_cluster to register with defined by outbound-socket-binding in socket-binding-group.</xs:documentation>
</xs:annotation>
<xs:simpleType>
<xs:list itemType="xs:string"/>
</xs:simpleType>
</xs:attribute>
The following setup works out of box
Download wildfly-9.0.0.CR1.zip or build with ./build.sh from sources
Let's assume you have 2 boxes, Apache HTTP Server with mod_cluster acting as a load balancing proxy and your WildFly server acting as a worker. Make sure botch servers can access each other on both MCMP enabled VirtualHost's address and port (Apache HTTP Server side) and on WildFly AJP and HTTP connector side. The common mistake is to binf WildFLy to localhost; it then reports its addess as localhost to the Apache HTTP Server residing on a dofferent box, which makes it impossible for it to contact WildFly server back. The communication is bidirectional.
This is my configuration diff from the default wildfly-9.0.0.CR1.zip.
328c328
< <mod-cluster-config advertise-socket="modcluster" connector="ajp" advertise="false" proxies="my-proxy-one">
---
> <mod-cluster-config advertise-socket="modcluster" connector="ajp">
384c384
< <subsystem xmlns="urn:jboss:domain:undertow:2.0" instance-id="worker-1">
---
> <subsystem xmlns="urn:jboss:domain:undertow:2.0">
435c435
< <socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:102}">
---
> <socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:0}">
452,454d451
< <outbound-socket-binding name="my-proxy-one">
< <remote-destination host="10.10.2.4" port="6666"/>
< </outbound-socket-binding>
456c453
< </server>
---
> </server>
Changes explanation
proxies="my-proxy-one", outbound socket binding name; could be more of them here.
instance-id="worker-1", the name of the worker, a.k.a. JVMRoute.
offset -- you could ignore, it's just for my test setup. Offset does not apply to outbound socket bindings.
<outbound-socket-binding name="my-proxy-one"> - IP and port of the VirtualHost in Apache HTTP Server containing EnableMCPMReceive directive.
Conclusion
Generally, these MEM read / node error messages are related to network problems, e.g. WildFly can contact Apache, but Apache cannot contact WildFly back. Last but not least, it could happen that the Apache HTTP Server's configuration uses PersistSlots directive and some substantial enviroment conf change took place, e.g. switch from mpm_prefork to mpm_worker. In this case, MEM Read error messages are not realted to WildFly, but to the cached slotmem files in HTTPD/cache/mod_custer that need to be deleted.
I'm certain it's network in your case though.
After a couple of weeks I got back to the problem and found the solution. The problem was - of course - in configuration and had nothing in common with the particular version of Wildfly. Mode specifically:
There were three nodes in the domain and three servers in each node. All nodes were launched with the following property:
-Djboss.node.name=nodeX
...where nodeX is the name of a particular node. However, it meant that all three servers in the node get the same name, which is exactly what confused the load balancer.
As soon as I have removed this property, everything started to work.

NServiceBus MSMQ Send question

I have trouble sending a message via NServiceBus. I have an ASP.Net MVC web app, developing on Win7 x64, I have configured my web.config as
<MsmqTransportConfig InputQueue="worker" ErrorQueue="error" NumberOfWorkerThreads="1" MaxRetries="5" />
<UnicastBusConfig>
<MessageEndpointMappings>
<add Messages="PricingInformation.Messages" Endpoint="worker2" />
</MessageEndpointMappings> </UnicastBusConfig>
In application_start I wire up the following:
var bus = NServiceBus.Configure.WithWeb()
.StructureMapBuilder()
.XmlSerializer()
.MsmqTransport()
.IsTransactional(false)
.PurgeOnStartup(false)
.UnicastBus()
.ImpersonateSender(false)
.CreateBus()
.Start();
When the action I'm interested happens in the app I fire
public override void HandleEvent(SupplierPricingUpdatedEvent updatedEvent)
{
bus.Send(new ModelSupplierDetailsUpdatedMessage() {Id = updatedEvent.Id})
return;
}
ModelSupplierDetailsUpdatedMessage is simple class in PricingInformation.Messages using interface marker IMessage and decorated with Serializable attribute.
The MSMQ queues are setup, not transactional, and everyone including NETWORK SERVICE and IIS_IUSRS have full control (in deseperate troubleshooting measures)
log4net shows the following:
DEBUG NServiceBus.Utils.MsmqUtilities 14 - Checking if queue exists: worker.
DEBUG NServiceBus.Utils.MsmqUtilities 14 - Checking if queue exists: error.
DEBUG NServiceBus.Unicast.UnicastBus Worker.15 - Calling 'HandleBeginMessage' on NServiceBus.SagaPersisters.NHibernate.NHibernateMessageModule
INFO NServiceBus.Unicast.UnicastBus Worker.15 - worker initialized.
DEBUG NServiceBus.Unicast.UnicastBus Worker.15 - Calling 'HandleEndMessage' on NServiceBus.SagaPersisters.NHibernate.NHibernateMessageModule
DEBUG NServiceBus.Unicast.UnicastBus 9 - Sending message PricingInformation.Messages.ModelSupplierDetailsUpdatedMessage, PricingInformation.Messages, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null with ID 2c642672-1bf1-48d4-a90f-734e2fdd726d\8267 to destination worker2.
Yet no matter what I try, and what I tweak (been three hours at it already) the message never appears in the queue. I cant find an exception and i have debug level logging on everything.. Its probably something simple help
The problem is that if you manually set up your queues as non-transactional, then it won't work. Setting the NServiceBus "IsTransactional" property to false doesn't mean you can work with non-transactional queues.
Please try deleting the queues and recreating them as transactional or, if you're using the beta of v2.0, just letting NServiceBus create the queues for you.