Ehcache jgroup replication issue [FIND_INITIAL_MBRS FIND_MBRS ] are required by GMS, but not provided - kubernetes

Our project moved from tradition tomcat deployment to kubernetes just now. Previously, Ehache replication was working using RMI Replicator. RMI relication is also now not working with Kubernetes so I am trying to replicate ehcache kubernetes using jroups. Without using kubernetes everything is working but during deployment I am getting below log.
I ma using K3s and application is in springboot.
I have followed this tutorial
https://github.com/kunal-bhatia/ehcache-jgroups-demo
ERROR n.s.e.d.j.JGroupsCacheManagerPeerProvider:140 - Failed to create JGroups Channel, replication will not function. JGroups properties:
null
java.lang.Exception: events [FIND_INITIAL_MBRS FIND_MBRS ] are required by GMS, but not provided by any of the protocols below it
at org.jgroups.stack.Configurator.sanityCheck(Configurator.java:503)
at org.jgroups.stack.Configurator.connectProtocols(Configurator.java:223)
at org.jgroups.stack.Configurator.setupProtocolStack(Configurator.java:123)
at org.jgroups.stack.Configurator.setupProtocolStack(Configurator.java:57)
at org.jgroups.stack.ProtocolStack.setup(ProtocolStack.java:476)
at org.jgroups.JChannel.init(JChannel.java:852)
tcp.xml file
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="urn:org:jgroups"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
<TCP
bind_addr="match-interface:eth0,match-interface:lo"
bind_port="7810"
recv_buf_size="5M"
send_buf_size="1M"
max_bundle_size="64K"
enable_diagnostics="true"
thread_naming_pattern="cl"
thread_pool.min_threads="0"
thread_pool.max_threads="500"
thread_pool.keep_alive_time="30000"/>
<org.jgroups.protocols.kubernetes.KUBE_PING
namespace="${KUBE_NAMESPACE:ehcache-demo}"/>
<MERGE3 max_interval="30000"
min_interval="10000"/>
<VERIFY_SUSPECT timeout="1500"/>
<BARRIER/>
<pbcast.NAKACK2 xmit_interval="500"
xmit_table_num_rows="100"
xmit_table_msgs_per_row="2000"
xmit_table_max_compaction_time="30000"
use_mcast_xmit="false"
discard_delivered_msgs="true"/>
<UNICAST3
xmit_table_num_rows="100"
xmit_table_msgs_per_row="1000"
xmit_table_max_compaction_time="30000"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
max_bytes="8m"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"
view_bundling="true"/>
<MFC max_credits="2M"
min_threshold="0.4"/>
<FRAG2 frag_size="60K"/>
<pbcast.STATE_TRANSFER/>
<CENTRAL_LOCK/>
<COUNTER/>
</config>

Related

SSL handshake failure for postgres connection over cloud in liberty

I have been stuck with this issue longer than i want to admit. I am trying to use postgres db connection for my small spring mvc project over liberty. My server.xml looks like below.
<server description="new server">
<!-- Enable features -->
<featureManager>
<feature>jdbc-4.2</feature>
<feature>jsp-2.3</feature>
<feature>localConnector-1.0</feature>
<feature>servlet-4.0</feature>
<feature>ldapRegistry-3.0</feature>
<feature>appSecurity-3.0</feature>
<feature>transportSecurity-1.0</feature>
</featureManager>
<ssl id="defaultSSLSettings" keyStoreRef="defaultKeyStore" trustDefaultCerts="true" />
<keyStore id="defaultKeyStore"
location="/opt/ibm/wlp/usr/servers/defaultServer/resources/security/key.jks"
password="changeIt"/>
<dataSource id="DefaultDataSource" jndiName="jdbc/postgres"
transactional='true' type='javax.sql.ConnectionPoolDataSource'>
<jdbcDriver libraryRef="PostgresLib" />
<properties databaseName="clouddb"
password=""
serverName="ea957de9-2271-4d6c-999e-f2c250575850.budepemd0im5pmu4u60g.databases..cloud"
user="admin" portNumber="30352" />
</dataSource>
<library id="PostgresLib">
<fileset
dir="C:/Users/AkanchaSingh/Desktop/iit-test-app/test-app"
includes="postgresql-42.2.5.jre6.jar" />
</library>
<httpEndpoint host="*" httpPort="9080" httpsPort="9443"
id="defaultHttpEndpoint">
<tcpOptions soReuseAddr="true" />
<httpOptions maxKeepAliveRequests="-1" />
</httpEndpoint>
<applicationManager autoExpand="true"
startTimeout="600" stopTimeout="600"></applicationManager>
<applicationMonitor updateTrigger="mbean" />
<webApplication autoStart="true" contextRoot="test"
id="test" location="/opt/ibm/wlp/usr/servers/defaultServer/test.war"
name="test">
</webApplication>
</server>
I have tried connecting to postgres by all methods eg: datasource and also connection manager
Properties info = new Properties();
String url = "jdbc:postgresql://ea957de9-2271-4d6c-999e-f2c250575850.budepemd0im5pmu4u60g.databases.appdomain.cloud:30352/clouddb";
info.setProperty("user", "");
info.setProperty("password", "");
info.setProperty("ssl", "true");
info.setProperty("sslfactory", "org.postgresql.ssl.SingleCertValidatingFactory");
info.setProperty("sslfactoryarg", loadFile("/opt/ibm/wlp/usr/servers/defaultServer/resources/security/PGSSLROOTCERT.crt"));
Even after providing the root.crt from the postgres db connection page I keep getting
[err] org.postgresql.util.PSQLException: SSL error: Received fatal alert: handshake_failure
[err] at org.postgresql.ssl.MakeSSL.convert(MakeSSL.java:42)
[err] at org.postgresql.core.v3.ConnectionFactoryImpl.enableSSL(ConnectionFactoryImpl.java:435)
[err] at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:94)
[err] at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:192)
[err] at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
[err] at org.postgresql.jdbc.PgConnection.(PgConnection.java:195)
I also have tried to pass the certificate in my keystore and truststore.. Nothing seems to work in this case.. I can connect successfully to the postgres db locally via IDE and also through psql but as soon as i dockerize it and run, it throws this exception.
DockerFile:
FROM websphere-liberty:19.0.0.12-full-java8-ibmjava
ENTRYPOINT ["/opt/ibm/wlp/bin/server","run","defaultServer"]
USER root
EXPOSE 9080
COPY --chown=1001:0 server.xml /opt/ibm/wlp/usr/servers/defaultServer/
RUN mkdir -p /root/.postgresql
COPY --chown=1001:0 root.crt /root/.postgresql/
COPY --chown=1001:0 key.jks /opt/ibm/wlp/usr/servers/defaultServer/resources/security/
RUN chmod -R 777 /opt/ibm/wlp/usr/servers/defaultServer/resources/security
COPY --chown=1001:0 target/test.war /opt/ibm/wlp/usr/servers/defaultServer/
RUN installUtility install --acceptLicense defaultServer
RUN chmod -R 777 /opt/ibm/wlp/output/defaultServer/workarea
RUN chmod a+rwx /opt/ibm/wlp/output/defaultServer
OpenLiberty has an automated test bucket that runs with PostgreSQL in a docker container, where 2 of the Liberty servers use SSL successfully. Here is one of them:
https://github.com/OpenLiberty/open-liberty/blob/integration/dev/com.ibm.ws.jdbc_fat_postgresql/publish/servers/server-PostgreSQLSSLTest/server.xml
Try using the <properties.postgresql> element under your <dataSource> instead of the generic <properties>. The former has additional properties for ssl on it, as illustrated in the server.xml of the test case.

Infinispan cluster nodes only see themself and not the other instances running in Kubernetes nodes

I try to set up an infinispan cache in my application that is running on several nodes on google-cloud-platform with Kubernetes and Docker.
Each of these caches shall share their data with the other node chaches so they all have the same data available.
My problem is that the JGroups configuration doesn't seem to work the way I want and the nodes don't see any of their siblings.
I tried several configurations but the nodes always see themselves and do not build up a cluster with the other ones.
I've tried some configurations from GitHub examples like https://github.com/jgroups-extras/jgroups-kubernetes or https://github.com/infinispan/infinispan-simple-tutorials
Here my jgroups.xml
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.0.xsd">
<TCP bind_addr="${jgroups.tcp.address:127.0.0.1}"
bind_port="${jgroups.tcp.port:7800}"
enable_diagnostics="false"
thread_naming_pattern="pl"
send_buf_size="640k"
sock_conn_timeout="300"
bundler_type="no-bundler"
logical_addr_cache_expiration="360000"
thread_pool.min_threads="${jgroups.thread_pool.min_threads:0}"
thread_pool.max_threads="${jgroups.thread_pool.max_threads:200}"
thread_pool.keep_alive_time="60000"
/>
<org.jgroups.protocols.kubernetes.KUBE_PING
port_range="1"
namespace="${KUBERNETES_NAMESPACE:myGoogleCloudPlatformNamespace}"
/>
<MERGE3 min_interval="10000"
max_interval="30000"
/>
<FD_SOCK />
<!-- Suspect node `timeout` to `timeout + timeout_check_interval` millis after the last heartbeat -->
<FD_ALL timeout="10000"
interval="2000"
timeout_check_interval="1000"
/>
<VERIFY_SUSPECT timeout="1000"/>
<pbcast.NAKACK2 xmit_interval="100"
xmit_table_num_rows="50"
xmit_table_msgs_per_row="1024"
xmit_table_max_compaction_time="30000"
resend_last_seqno="true"
/>
<UNICAST3 xmit_interval="100"
xmit_table_num_rows="50"
xmit_table_msgs_per_row="1024"
xmit_table_max_compaction_time="30000"
/>
<pbcast.STABLE stability_delay="500"
desired_avg_gossip="5000"
max_bytes="1M"
/>
<pbcast.GMS print_local_addr="false"
join_timeout="${jgroups.join_timeout:5000}"
/>
<MFC max_credits="2m"
min_threshold="0.40"
/>
<FRAG3 frag_size="8000"/>
</config>
And how I initalize the Infinispan Cache (Kotlin)
import org.infinispan.configuration.cache.CacheMode
import org.infinispan.configuration.cache.ConfigurationBuilder
import org.infinispan.configuration.global.GlobalConfigurationBuilder
import org.infinispan.manager.DefaultCacheManager
import java.util.concurrent.TimeUnit
class MyCache<V : Any>(private val cacheName: String) {
companion object {
private var cacheManager = DefaultCacheManager(
GlobalConfigurationBuilder()
.transport().defaultTransport()
.addProperty("configurationFile", "jgroups.xml")
.build()
)
}
private val backingCache = buildCache()
private fun buildCache(): org.infinispan.Cache<CacheKey, V> {
val cacheConfiguration = ConfigurationBuilder()
.expiration().lifespan(8, TimeUnit.HOURS)
.clustering().cacheMode(CacheMode.REPL_ASYNC)
.build()
cacheManager.defineConfiguration(this.cacheName, cacheConfiguration)
log.info("Started cache with name $cacheName. Found cluster members are ${cacheManager.clusterMembers}")
return cacheManager.getCache(this.cacheName)
}
}
Here what the logs says
INFO o.i.r.t.jgroups.JGroupsTransport - ISPN000078: Starting JGroups channel ISPN
INFO o.j.protocols.kubernetes.KUBE_PING - namespace myNamespace set; clustering enabled
INFO org.infinispan.CLUSTER - ISPN000094: Received new cluster view for channel ISPN: [myNamespace-7d878d4c7b-cks6n-57621|0] (1) [myNamespace-7d878d4c7b-cks6n-57621]
INFO o.i.r.t.jgroups.JGroupsTransport - ISPN000079: Channel ISPN local address is myNamespace-7d878d4c7b-cks6n-57621, physical addresses are [127.0.0.1:7800]
I expect that on startup a new node finds the already existing ones and gets the date from them.
Currently, on startup every node only sees themselves and nothing is shared
Usually the first thing to do when you need help with JGroups/Infinispan is setting trace-level logging.
The problem with KUBE_PING might be that the pod does not run under proper serviceaccount, and therefore it does not have the authorization token to access Kubernetes Master API. That's why currently preferred way is using DNS_PING, and registering a headless service. See this example.
Also, bind_addr is set to 127.0.0.1. This means, members on different hosts won't be able to find each other. I suggest set bind_addr, e.g. <TCP bind_addr="site_local".../>.
See [1] for details.
[1] http://www.jgroups.org/manual4/index.html#Transport

Hazelcast Kubernetes plugin is not defined

I`m trying to create an OrientDB (version 3.0.10) cluster using Kubernetes. OrientDB uses Hazelcast (version 3.10.4) in its distributed mode that is why I hat to set up KubernetesHazelcast plugin. I used this repository as an example.
I have created all the necessary configuration files, I have defined hazelcast Kubernetes dependency (version 1.3.1) in build.sbt file for my project and this dependency appeared in the classpath
However, the logs on each pod show this error message:
com.orientechnologies.orient.server.distributed.ODistributedStartupException: Error on starting distributed plugin
Caused by: com.hazelcast.config.properties.ValidationException: There is no discovery strategy factory to create 'DiscoveryStrategyConfig{properties={service-dns=orientdbservice2.default.svc.cluster.local, service-dns-timeout=10}, className='com.hazelcast.kubernetes.HazelcastKubernetesDiscoveryStrategy', discoveryStrategyFactory=null}' Is it a typo in a strategy classname? Perhaps you forgot to include implementation on a classpath?
So it looks like the Hazelcast Kubernetes dependency is set up in a worng way. How can this error be fixed?
Here is my config hazelcast.xml file:
<properties>
<property name="hazelcast.discovery.enabled">true</property>
</properties>
<network>
<join>
<multicast enabled="false"/>
<tcp-ip enabled="false" />
<discovery-strategies>
<discovery-strategy enabled="true"
class="com.hazelcast.kubernetes.HazelcastKubernetesDiscoveryStrategy">
<properties>
<property name="service-dns">orientdbservice2.default.svc.cluster.local</property>
<property name="service-dns-timeout">10</property>
</properties>
</discovery-strategy>
</discovery-strategies>
</join>
</network>
For the cluster creation, I use StatefulSet with OrientDB image and mount all the config files as config maps. I am pretty sure that the problem is not in my config files as with multicast instead of the dns strategy everything works fine. Also, there are no network problems in the Kubernetes cluster itself.
First of all, OrientDB version should be updated to the latest - 3.0.10 with embedded newest Hazelcast version. Also, I have mounted hazelcast-kubernetes.jar dependency file directly into /orientdb/lib folder and it started to work properly. HazelcastKubernetes plugin is discovered and nodes join the cluster:
INFO [172.17.0.3]:5701 [orientdb-test-cluster-1] [3.10.4] Kubernetes Discovery activated resolver: DnsEndpointResolver [DiscoveryService]
INFO [172.17.0.3]:5701 [orientdb-test-cluster-1] [3.10.4] Activating Discovery SPI Joiner [Node]
INFO [172.17.0.3]:5701 [orientdb-test-cluster-1] [3.10.4] Starting 2 partition threads and 3 generic threads (1 dedicated for priority tasks) [OperationExecutorImpl]
Members {size:3, ver:3} [
Member [172.17.0.3]:5701 - hash
Member [172.17.0.4]:5701 - hash
Member [172.17.0.8]:5701 - hash
]

Can the reverseProxyEndpointPort on local dev cluster run?

I am attempting to enable the reverse proxy functionality of service fabric on a local 5 node dev cluster. This functionality seems to work fine on a deployed cluster, but not on the dev cluster?
Both the deployed and the local dev cluster are on 5.4.145.9494.
The local dev is on vs 2015, service fabric sdk 2.4.145.9494
I have referenced How to configure and enable Azure Service Fabric Reverse Proxy for an existing on-premises cluster?
but the clustermanifesttemplate, specifically w7 in my case, doesn't seem to reference these values. Only the "older" ApplicationGateway/Http.
If I enable
<Section Name="ApplicationGateway/Http">
<Parameter Name="IsEnabled" Value="true" />
</Section>
and then deploy an application, after a few minutes my (local)cluster crashes.
Current node type example for reference:
<NodeType Name="NodeType0">
<Endpoints>
<ClientConnectionEndpoint Port="19000" />
<LeaseDriverEndpoint Port="19001" />
<ClusterConnectionEndpoint Port="19002" />
<HttpGatewayEndpoint Port="19080" Protocol="http" />
<ServiceConnectionEndpoint Port="19006" />
<HttpApplicationGatewayEndpoint Port="19081" Protocol="http" />
<ApplicationEndpoints StartPort="30001" EndPort="31000" />
</Endpoints>
</NodeType>
Additional information:
Windows event viewer is showing
HostedService: _Node_0 on node id bf865279ba277deb864a976fbf4c200e terminated unexpectedly with code 3221225781 and process name FabricApplicationGateway.exe
port usage:
netstat -anob | find "19081"
<no return>
Check your other node types. On a local cluster, each endpoint needs a unique port on each node type because it's all running on one machine. I'm guessing something else is already using port 19081 on another node type.

jboss-esb fs-listener jbm message queue overflow

We have a jboss esb server which is reading files from the file system in a scheduled way (schedule frequency of 20sec) and convert them into the esb message then we parse the message.
There are some other providers/listeners (jms) and services configured on the esb servers. When there is an error in one of the services it effects the above process. File system provider (gateway) is working fine but the jms-listener who takes the gateway messages are not working and lots of messages are accumulated in the jbm queue (jbm_msg Oracle DB table).
Here is the problem, when my server is restarted messages in the jbm-queue is parsed in the esb for just 20 seconds which is the scheduled frequency of fs-provider, never process messages again and cpu usage goes up to 100% and stays there. We believe somehow fs-providers interrupts the jms-provider.
Is there any configuration we have been missing out.
Here are the configuration files that we have:
jboss-esb.xml
<?xml version = "1.0" encoding = "UTF-8"?>
<jbossesb xmlns="http://anonsvn.labs.jboss.com/labs/jbossesb/trunk/product/etc/schemas/xml/jbossesb-1.0.1.xsd" parameterReloadSecs="5">
<providers>
<fs-provider name="SitaIstProvider">
<fs-bus busid="gw_sita_ist" >
<fs-message-filter
directory="/ikarussita/IST/IN"
input-suffix=".RCV"
work-suffix=".lck"
post-delete="false"
post-directory="/ikarussita/IST/OK"
post-suffix=".ok"
error-delete="false"
error-directory="/ikarussita/IST/ERR"
error-suffix=".err"/>
</fs-bus>
</fs-provider>
<jms-provider name="SitaESBQueue" connection-factory="ConnectionFactory">
<jms-bus busid="esb_sita_queue">
<jms-message-filter dest-type="QUEUE" dest-name="queue/esb_sita_queue"/>
</jms-bus>
</jms-provider>
</providers>
<services>
<service category="SITA" name="SITA_IST" description="SITA Daemon For ISTCOXH">
<listeners>
<fs-listener name="Sita_Ist_Gateway" busidref="gw_sita_ist" is-gateway="true" schedule-frequency="20" />
<jms-listener name="Jms_Sita_EsbAware" busidref="esb_sita_queue" />
</listeners>
<actions mep="OneWay">
<action name="parse_msg" class="com.celebi.integration.action.sita.inbound.SitaHandler" process="parseMessage" />
<action name="send_ikarus" class="com.celebi.integration.action.ikarus.outbound.fis.FlightJmsSender" />
</actions>
</service>
</services>
</jbossesb>
jbm-queue-service.xml
<?xml version="1.0" encoding="UTF-8"?>
<server>
<mbean code="org.jboss.jms.server.destination.QueueService"
name="jboss.messaging.destination:service=Queue,name=esb_sita_queue"
xmbean-dd="xmdesc/Queue-xmbean.xml">
<depends optional-attribute-name="ServerPeer">jboss.messaging:service=ServerPeer</depends>
<depends>jboss.messaging:service=PostOffice</depends>
</mbean>
<server>
deployment.xml
<jbossesb-deployment>
<depends>jboss.messaging.destination:service=Queue,name=esb_sita_queue</depends>
</jbossesb-deployment>
Thanx
Split the service into 2 separate services, one handling the JMS queue, the other the file poller. Specify the same action pipeline. That way you get the same functionality but without the threading issue. Also use max-threads attr on the listener to specify the number of reading threads.