Hazelcast discovery in Openshift: Connection reset warnings to routers IPs - kubernetes

Openshift routers update from version 3.7 to version 3.9 caused hundreds of warnings in Openshift logs:
[timestamp] [hz._hzInstance_1_dev.IO.thread-in-2] WARN com.hazelcast.nio.tcp.TcpIpConnection - [x.x.19.150]:5701 [dev] [3.11.4] Connection[id=157132, /x.x.19.150:5701->/x.x.25.1:50370, endpoint=null, alive=false, type=NONE] closed. Reason: Exception in Connection[id=157132, /x.x.19.150:5701->/x.x.25.1:50370, endpoint=null, alive=true, type=NONE], thread=hz._hzInstance_1_dev.IO.thread-in-2 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
[timestamp] [hz._hzInstance_1_dev.IO.thread-in-0] WARN com.hazelcast.nio.tcp.TcpIpConnection - [x.x.31.153]:5701 [dev] [3.11.4] Connection[id=156553, /x.x.31.153:5701->/x.x.9.1:48700, endpoint=null, alive=false, type=NONE] closed. Reason: Exception in Connection[id=156553, /x.x.31.153:5701->/x.x.9.1:48700, endpoint=null, alive=true, type=NONE], thread=hz._hzInstance_1_dev.IO.thread-in-0\njava.io.IOException: Connection reset by peer\n at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
[timestamp] [hz._hzInstance_1_dev.IO.thread-in-2] WARN com.hazelcast.nio.tcp.TcpIpConnection - [x.x.3.34]:5701 [dev] [3.11.4] Connection[id=157179, /x.x.3.34:5701->/x.x.25.1:60596, endpoint=null, alive=false, type=NONE] closed. Reason: Exception in Connection[id=157179, /x.x.3.34:5701->/x.x.25.1:60596, endpoint=null, alive=true, type=NONE]
[timestamp] [hz._hzInstance_1_dev.IO.thread-in-1] WARN com.hazelcast.nio.tcp.TcpIpConnection - [x.x.10.75]:5701 [dev] [3.11.4] Connection[id=157171, /x.x.10.75:5701->/x.x.25.1:33826, endpoint=null, alive=false, type=NONE] closed. Reason: Exception in Connection[id=157171, /x.x.10.75:5701->/x.x.25.1:33826, endpoint=null, alive=true, type=NONE]
[timestamp] [hz._hzInstance_1_dev.IO.thread-in-1] WARN com.hazelcast.nio.tcp.TcpIpConnection - [x.x.27.206]:5701 [dev] [3.11.4] Connection[id=157157, /x.x.27.206:5701->/x.x.25.1:49578, endpoint=null, alive=false, type=NONE] closed. Reason: Exception in Connection[id=157157, /x.x.27.206:5701->/x.x.25.1:49578, endpoint=null, alive=true, type=NONE]
[timestamp] [hz._hzInstance_1_dev.IO.thread-in-1] WARN com.hazelcast.nio.tcp.TcpIpConnection - [x.x.31.153]:5701 [dev] [3.11.4] Connection[id=157127, /x.x.31.153:5701->/x.x.25.1:42506, endpoint=null, alive=false, type=NONE] closed. Reason: Exception in Connection[id=157127, /x.x.31.153:5701->/x.x.25.1:42506, endpoint=null, alive=true, type=NONE]
Issue was temporary "solved" by roll back to version 3.7: no warnings in Hazelcast logs anymore.
Current findings:
All exceptions contain IPs that end to x.x.x.1. That are Openshift routers IPs.
Target ports are 50370, 48700, 60596, 39840, 35046, 59900, etc.
hazelcast.xml:
...
<properties>
<property name="hazelcast.discovery.enabled">true</property>
<property name="hazelcast.logging.type">slf4j</property>
</properties>
<network>
<port port-count="1" auto-increment="false">5701</port>
<reuse-address>true</reuse-address>
<join>
<multicast enabled="false"/>
<kubernetes enabled="true">
<namespace>project-name</namespace>
<service-name>hazelcast-discovery</service-name>
</kubernetes>
</join>
</network>
Versions details:
Hazelcast version: 3.11.4
OpenShift Master: v3.9.68 (Kubernetes Master: v1.9.1+a0ce1bc657), routers version 3.9
Misc:
The issue is not reproduced with another cluster (Openshift 3.11, router 3.9, Hazelcast 3.11.4)
The issue was reproduced in the same cluster with Hazelcast version 3.10.
Questions:
What is the root cause of these warnings?
Can we tune our configuration to avoid such connections?
Would it help?
remove <property name="hazelcast.discovery.enabled">true</property>
add <tcp-ip enabled="false"></tcp-ip>
(I didn’t find in hazelcast documentation what is default value for tcp-ip. Official example of hazelcast.xml explicitly set tcp-ip to false.)
Edit 24.10.2019. Deployment details:
Application and Hazelcast are in the same project. Application
connects to hazelcast by service name: hazelcast:5701
We use custom
livenessProbe/readinessProbe that start bash script every 10s. See below.
We have also have a route for Hazelcast. Endpoint /hazelcast/rest/cluster showed correct number of members.
Here is full hazelcast config: https://gitlab.com/snippets/1907166
Are our service settings correct?
bash script for probes:
#!/bin/bash
URL="http://127.0.0.1:5701/hazelcast/health/node-state"
HTTP_RESPONSE=$(curl -m 5 -sS $URL | head -1)
if [ "_${HTTP_RESPONSE}" != "_ACTIVE" ]; then
echo "failure on ${URL}, response: ${HTTP_RESPONSE}"
exit 1
fi
exit 0
Edit 25.10.2019:
$ oc describe svc hazelcast-discovery
Name: hazelcast-discovery
Namespace: [project-name]
Labels: app=hazelcast
template=hazelcast-statefulset-template
Annotations: service.alpha.kubernetes.io/tolerate-unready-endpoints=true
Selector: name=hazelcast-node-cluster
Type: ClusterIP
IP: None
Port: 5701-tcp 5701/TCP
TargetPort: 5701/TCP
Endpoints: x.x.1.45:5701,x.x.12.144:5701,x.x.13.251:5701 and more...
Session Affinity: None
Events: <none>
Pods were restarted after the issue, so IPs might differ from the logs.
Could it be connected to tolerate-unready-endpoints=true?

Related

Access kafka on cloud from onprem K8

I am trying to connect to kafka broker setup on azure aks from onprem rancher k8 cluster over internet . I have created a loadbalancer listener on azure kafka. it is creating 4 public ip's using azure load balancer service.
- name: external
port: 9093
type: loadbalancer
tls: true
authentication:
type: tls
configuration:
bootstrap:
loadBalancerIP: 172.22.199.99
annotations:
external-dns.alpha.kubernetes.io/hostname: bootstrap.example.com
brokers:
- broker: 0
loadBalancerIP: 172.22.199.100
annotations:
external-dns.alpha.kubernetes.io/hostname: kafka-0.example.com
- broker: 1
loadBalancerIP: 172.22.199.101
annotations:
external-dns.alpha.kubernetes.io/hostname: kafka-1.example.com
- broker: 2
loadBalancerIP: 172.22.199.102
annotations:
external-dns.alpha.kubernetes.io/hostname: kafka-2.example.com
brokerCertChainAndKey:
secretName: source-kafka-listener-cert
certificate: tls.crt
key: tls.key
now to connect from onprem i have opened firewall for only bootstrap lb ip .My understanding is that bootstrap will inturn route the request to individual broker but that is not happening. When i try to connect connection is established with bootstrap loadbalncer ip but after that i get timeout error .
2022-08-22 08:14:04,659 INFO Metrics reporters closed (org.apache.kafka.common.metrics.Metrics) [kafka-admin-client-thread | adminclient-1]
2022-08-22 08:14:04,659 ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectDistributed) [main] org.apache.kafka.connect.errors.ConnectException: Failed to connect to and describe Kafka cluster. Check worker's broker connection and security properties.
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:70)
at org.apache.kafka.connect.util.ConnectUtils.lookupKafkaClusterId(ConnectUtils.java:51)
please let me know if i have to open firewall for individual brokers as well?

Hazelcast map on kubernetes not in sync

Writing a spring based application written in kotlin with hazelcast I encounter issue when deploying to kubernetes.
For hazelcast on kubernetes I use DNS lookup mode for discovery.
I have the following hazelcast configuration:
hazelcast:
network:
join:
multicast:
enabled: false
kubernetes:
enabled: true
service-dns: my-application-hs
And the following service.yaml for the kubernetes deployment:
apiVersion: v1
kind: Service
metadata:
name: my-application
spec:
type: ClusterIP
selector:
component: my-application
ports:
- name: http
port: 80
protocol: TCP
targetPort: http
---
apiVersion: v1
kind: Service
metadata:
name: my-application-hs
spec:
type: ClusterIP
clusterIP: None
selector:
component: my-application
ports:
- name: hazelcast
port: 5701
A hazelcast map is used like this:
#Component
class CacheClientImplHazelcast(){
private val hz: HazelcastInstance
init{
val serializer = SerializerConfig()
.setTypeClass(MyDto::class.java)
.setImplementation(MyDtoSerializer())
val config = Config()
config.serializationConfig.addSerializerConfig(serializer)
hz = Hazelcast.newHazelcastInstance(config)
}
fun getAllData(): List<MyDto> {
val map: IMap<String, MyDto> = hz.getMap("my-map")
return map.values.toList()
}
fun putData(key:String, myDto: MyDto) {
val map: IMap<String, MyDto> = hz.getMap("my-map")
map.put(key, myDto)
}
override fun clear() {
val map: IMap<String, MyDto> = hz.getMap("my-map")
map.clear()
}
}
When running 3 instances on kubernetes the logs from hazelcast always show me 4 entries, something like this:
Members {size:4, ver:53} [
Member [10.4.2.32]:5701 - c1e70d6f-a62d-4924-9815-36bb1f98f141
Member [10.4.3.25]:5702 - be96c292-8847-4f56-ae32-f27f380d7c5b
Member [10.4.2.32]:5702 - 6ca96bfd-eb74-4149-8630-a2488e76d97d
Member [10.4.11.41]:5702 - 7e8b0bc9-ad2b-41eb-afbf-b7af9ed497bd this
]
(Side question 1: why do I see 4 here instead of 3?)
Now even though it seems the member are connected (at least the logs of the nodes show all the same member uuids) when I write data on one node it is not available on the other nodes. Calls to getAllData only show data that has been put into the hazelcast map on this node. When I send requests to the individual nodes (curl on shell) I only see a fraction of the data. When I send a request on the normal url of the pod then with round-robin I get data of the different nodes, which is not synchronized.
If I run the very same application locally with the following hazelcast.yaml:
hazelcast:
network:
join:
multicast:
enabled: false
Then it all works correctly and the cache really seems to be "synchronized" across different server instances running locally. For this test I start the application on different ports.
However what is also strange, even if I run 2 instances locally, the logs I see from hazelcast indicate there are 4 members:
Members {size:4, ver:4} [
Member [192.168.68.106]:5701 - 01713c9f-7718-4ed4-b532-aaf62443c425
Member [192.168.68.106]:5702 - 09921004-88ef-4fe5-9376-b88869fde2bc
Member [192.168.68.106]:5703 - cea4b13f-d538-48f1-b0f2-6c53678c5823 this
Member [192.168.68.106]:5704 - 44d84e70-5b68-4c69-a45b-fee39bd75554
]
(Side questions 2: Why do I see 4 members even I only started 2 locally?)
The main question now is: Why does this setup not work on kubernetes? Why does each node have a separated map that is not in sync with the other nodes?
Here some log message that could be relevant but I was not able to identify an issue with:
2022-04-19T12:29:31.414588357Z2022-04-19 12:29:31.414 INFO 1 --- [cached.thread-7] c.h.i.server.tcp.TcpServerConnector : [10.4.11.41]:5702 [dev] [5.1.1] Connecting to /10.4.3.25:5702, timeout: 10000, bind-any: true
2022-04-19T12:29:31.414806473Z2022-04-19 12:29:31.414 INFO 1 --- [.IO.thread-in-2] c.h.i.server.tcp.TcpServerConnection : [10.4.11.41]:5702 [dev] [5.1.1] Initialized new cluster connection between /10.4.11.41:5702 and /10.4.3.25:46475
2022-04-19T12:29:31.414905573Z2022-04-19 12:29:31.414 INFO 1 --- [cached.thread-4] c.h.i.server.tcp.TcpServerConnector : [10.4.11.41]:5702 [dev] [5.1.1] Connecting to /10.4.2.32:5702, timeout: 10000, bind-any: true
2022-04-19T12:29:31.416520854Z2022-04-19 12:29:31.416 INFO 1 --- [.IO.thread-in-0] c.h.i.server.tcp.TcpServerConnection : [10.4.3.25]:5702 [dev] [5.1.1] Initialized new cluster connection between /10.4.3.25:5702 and /10.4.11.41:40455
2022-04-19T12:29:31.416833551Z2022-04-19 12:29:31.416 INFO 1 --- [.IO.thread-in-1] c.h.i.server.tcp.TcpServerConnection : [10.4.2.32]:5702 [dev] [5.1.1] Initialized new cluster connection between /10.4.2.32:54433 and /10.4.11.41:5702
2022-04-19T12:29:31.417377114Z2022-04-19 12:29:31.417 INFO 1 --- [.IO.thread-in-0] c.h.i.server.tcp.TcpServerConnection : [10.4.11.41]:5702 [dev] [5.1.1] Initialized new cluster connection between /10.4.11.41:40455 and /10.4.3.25:5702
2022-04-19T12:29:31.417545174Z2022-04-19 12:29:31.417 INFO 1 --- [.IO.thread-in-2] c.h.i.server.tcp.TcpServerConnection : [10.4.2.32]:5702 [dev] [5.1.1] Initialized new cluster connection between /10.4.2.32:5702 and /10.4.11.41:53547
2022-04-19T12:29:31.418541840Z2022-04-19 12:29:31.418 INFO 1 --- [.IO.thread-in-1] c.h.i.server.tcp.TcpServerConnection : [10.4.11.41]:5702 [dev] [5.1.1] Initialized new cluster connection between /10.4.11.41:53547 and /10.4.2.32:5702
2022-04-19T12:29:31.419763311Z2022-04-19 12:29:31.419 INFO 1 --- [.IO.thread-in-2] c.h.i.server.tcp.TcpServerConnection : [10.4.3.25]:5702 [dev] [5.1.1] Initialized new cluster connection between /10.4.3.25:46475 and /10.4.11.41:5702
2022-04-19T12:29:31.676218042Z2022-04-19 12:29:31.675 INFO 1 --- [gulis.migration] c.h.i.partition.impl.MigrationManager : [10.4.2.32]:5701 [dev] [5.1.1] Repartitioning cluster data. Migration tasks count: 271
There are two issues here but only one problem.
why the extra instances
Spring (Boot) will create a Hazelcast instance for you if it finds a Hazelcast config file and no HazelcastInstance #Bean. You can fix by excluding the HazelcastAutoConfiguration.class or by returning the instance you create in your component class as a #Bean.
why the data sync issue on kubernetes
Each pod accidentally has 2 Hazelcast nodes, one on 5701 and one on 5702. But your ClusterIP only lists 5701. Some instances in the pod can't be reached from outside. This will go away when you fix the first issue.

Strimzi Kafka external listeners Ingress error on VM [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
I'm trying to deploy Kafka with Strimzi on a Kubernetes Cluster runnng on VM (VMWare Workstation 15 and Ubuntu 20.04 running on it), using kubeadm, kubelet, containerd, Calico, MetalLb.
I can create the ingress nginx controller service of type loadbalancer with IP from range that I have specified, but when i create the Kafka cluster and its external listeners of type ingress and try to associate the DNS it crashes with error:
Exceeded timeout of 300000ms while waiting for Ingress resource my-cluster-kafka-bootstrap in namespace default to be addressable
This is the whole stack trace (from Strimzi cluster operator logs)
2021-09-16 16:59:21 WARN AbstractOperator:481 - Reconciliation #100(timer) Kafka(default/my-cluster): Failed to reconcile
io.strimzi.operator.common.operator.resource.TimeoutException: Exceeded timeout of 300000ms while waiting for Ingress resource my-cluster-kafka-bootstrap in namespace default to be addressable
at io.strimzi.operator.common.Util$1.lambda$handle$1(Util.java:139) ~[io.strimzi.operator-common-0.25.0.jar:0.25.0]
at io.vertx.core.impl.future.FutureImpl$3.onFailure(FutureImpl.java:128) ~[io.vertx.vertx-core-4.1.2.jar:4.1.2]
at io.vertx.core.impl.future.FutureBase.lambda$emitFailure$1(FutureBase.java:71) ~[io.vertx.vertx-core-4.1.2.jar:4.1.2]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [io.netty.netty-common-4.1.66.Final.jar:4.1.66.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [io.netty.netty-common-4.1.66.Final.jar:4.1.66.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [io.netty.netty-transport-4.1.66.Final.jar:4.1.66.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [io.netty.netty-common-4.1.66.Final.jar:4.1.66.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty.netty-common-4.1.66.Final.jar:4.1.66.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty.netty-common-4.1.66.Final.jar:4.1.66.Final]
at java.lang.Thread.run(Thread.java:829) [?:?]
This is my Kafka Cluster manifest
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
version: 2.8.0
replicas: 1
listeners:
- name: plain
port: 9092
type: internal
tls: false
authentication:
type: scram-sha-512
- name: external
port: 9094
type: ingress
tls: true
authentication:
type: scram-sha-512
configuration:
bootstrap:
host: localb.kafka.xxx.com
brokers:
- broker: 0
host: local.kafka.xxx.com
and this is my ingress controller service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
ingress-nginx-controller LoadBalancer 10.111.221.8 10.104.187.226 80:30856/TCP,443:31698/TCP 14h app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
Could you please help me out, how did you deloy Kafka with Strimzi on prem?
The error from Strimzi means that the Ingress resources are missing the .status section. When the Nginx Ingress controller registers them, it normally sets the status to something like this:
status:
loadBalancer:
ingress:
- ip: 192.168.1.245
where the IP address is the Ingress IP address (so in your case it would be 10.104.187.226). Strimzi is waiting for this and without it will not see the Ingresses as ready.
But that did not happened in our case. From my experience, that mostly means that the Ingress controller had not found the Ingress instances. They might be missing the right class name in the Ingres .spec or in the annotation, they might be in a namespace the Ingress controller is not watching for etc. Checking the log of the Ingress controller might help.

How do I scale up JBoss AMQ 6.3 in openshift?

I have a default setup of JBoss AMQ in OpenShift and want to scale it up to two pods. How do I accomplish this?
Just scaling it leads to the following:
INFO | Using Persistence Adapter: KahaDBPersistenceAdapter[/opt/amq/data/split-1/serverData/kahadb]
INFO | PListStore:[/opt/amq/data/split-1/serverData/backend-amq-12-j27mn/tmp_storage] started
INFO | Apache ActiveMQ 5.11.0.redhat-630343 (backend-amq-12-j27mn, ID:backend-amq-12-j27mn-35700-1597673091911-0:1) is starting
INFO | Listening for connections at: stomp://backend-amq-12-j27mn:61613?maximumConnections=1000&wireFormat.maxFrameSize=104857600&transport.hbGracePeriodMultiplier=2.5
INFO | Connector stomp started
INFO | Starting OpenShift discovery agent for service backend-amq-mesh transport type tcp
INFO | Network Connector DiscoveryNetworkConnector:NC:BrokerService[backend-amq-12-j27mn] started
INFO | Apache ActiveMQ 5.11.0.redhat-630343 (backend-amq-12-j27mn, ID:backend-amq-12-j27mn-35700-1597673091911-0:1) started
INFO | For help or more information please see: http://activemq.apache.org
INFO | Adding service: [tcp://100.62.169.46:61616, failed:false, connectionFailures:0]
INFO | Establishing network connection from vm://backend-amq-12-j27mn to tcp://100.62.169.46:61616
INFO | Connector vm://backend-amq-12-j27mn started
INFO | backend-amq-12-j27mn Shutting down NC
INFO | backend-amq-12-j27mn bridge to Unknown stopped
INFO | error with pending local brokerInfo on: vm://backend-amq-12-j27mn#0
org.apache.activemq.transport.TransportDisposedIOException: peer (vm://backend-amq-12-j27mn#1) stopped.
at org.apache.activemq.transport.vm.VMTransport.stop(VMTransport.java:230)[activemq-broker-5.11.0.redhat-630343.jar:5.11.0.redhat-630343]
at org.apache.activemq.transport.TransportFilter.stop(TransportFilter.java:65)[activemq-client-5.11.0.redhat-630343.jar:5.11.0.redhat-630343]
at org.apache.activemq.transport.TransportFilter.stop(TransportFilter.java:65)[activemq-client-5.11.0.redhat-630343.jar:5.11.0.redhat-630343]
at org.apache.activemq.transport.ResponseCorrelator.stop(ResponseCorrelator.java:132)[activemq-client-5.11.0.redhat-630343.jar:5.11.0.redhat-630343]
at org.apache.activemq.broker.TransportConnection.doStop(TransportConnection.java:1193)[activemq-broker-5.11.0.redhat-630343.jar:5.11.0.redhat-630343]
at org.apache.activemq.broker.TransportConnection$4.run(TransportConnection.java:1159)[activemq-broker-5.11.0.redhat-630343.jar:5.11.0.redhat-630343]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)[:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)[:1.8.0_171]
at java.lang.Thread.run(Thread.java:748)[:1.8.0_171]
INFO | Connector vm://backend-amq-12-j27mn stopped
WARN | Could not start network bridge between: vm://backend-amq-12-j27mn and: tcp://100.62.169.46:61616 due to: Connection refused (Connection refused)
INFO | Establishing network connection from vm://backend-amq-12-j27mn to tcp://100.62.169.46:61616
For pods to talk to each other you need to create a service:
apiVersion: v1
kind: Service
metadata:
name: amq-svc
spec:
selector:
docker-registry: default
...
ports:
- nodePort: 0
port: 61616
protocol: TCP
targetPort: 61616
For number of pods and scaling etc. You need to use a deployment config.
But rather than doing everything from scratch, I would just use an example project such as this. It already has a template ready for you to use, and then just edit it as per your requirements.

mulitple external name in kubernetes service to access the the external Remotely hosted mongodb with connectionstring

I would like to connect my Kubernetes Deployment to a remotely hosted database with URI.
I am able to connect to remotely hosted database with URI using Docker. Now I'd like to understand how I can specify multiple external names in Kubernetes service file.
I have a MongoDB cluster with the below URL:
mongodb://username:password#mngd-new-pr1-01:27017,mngd-new-pr2-02:27017,mngd-new-pr3-03:27017/
I have followed Kubernetes best practices: mapping external services. When I setup a single external name, it is working.
How can I specify all the 3 clusters in the external name?
kind: Service
apiVersion: v1
metadata:
name: mongo
spec:
type: ExternalName
externalName: mngd-new-pr1-01,mngd-new-pr2-02,mngd-new-pr3-03
ports:
- port: 27017
since i was unable to create the multiple external names .
I went with creating the headless service and then created the endpoints for the service .As described "Scenario 1: Database outside cluster with IP address"
From the logs , I think the connectivity is being established . but later there was exception like below and it was disconnected .
2019-03-20 11:26:13.941 INFO 1 --- [38.200.19:27038] org.mongodb.driver.connection : Opened connection [connectionId{localValue:1, serverValue:386066}] to .38.200.19:27038
2019-03-20 11:26:13.953 INFO 1 --- [.164.29.4:27038] org.mongodb.driver.connection : Opened connection [connectionId{localValue:2, serverValue:458254}] to .164.29.4:27038
2019-03-20 11:26:13.988 INFO 1 --- [38.200.19:27038] org.mongodb.driver.cluster : Monitor thread successfully connected to server with description ServerDescription{address=.38.200.19:27038, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 6, 8]}, minWireVersion=0, maxWireVersion=6, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=45440955, setName='no-prd-rep', canonicalAddress=mngd-new-pr1-01:27038, hosts=[mngd-new-pr1-01:27038, mngd-new-pr1-02:27038, mngd-new-pr1-03:27038], passives=[], arbiters=[], primary='mngd-new-pr1-01:27038'
2019-03-20 11:26:13.990 INFO 1 --- [38.200.19:27038] org.mongodb.driver.cluster : Adding discovered server mngd-new-pr1-01:27038 to client view of cluster
2019-03-20 11:26:13.992 INFO 1 --- [38.200.19:27038] org.mongodb.driver.cluster : Adding discovered server mngd-new-pr1-02:27038 to client view of cluster
2019-03-20 11:26:13.993 INFO 1 --- [38.200.19:27038] org.mongodb.driver.cluster : Adding discovered server mngd-new-pr1-03:27038 to client view of cluster
2019-03-20 11:26:13.997 INFO 1 --- [38.200.19:27038] org.mongodb.driver.cluster : Server 102.227.4:27038 is no longer a member of the replica set. Removing from client view of cluster.
2019-03-20 11:26:14.001 INFO 1 --- [.164.29.4:27038] org.mongodb.driver.cluster : Monitor thread successfully connected to server with description ServerDescription{address=.164.29.4:27038, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 6, 8]}, minWireVersion=0, maxWireVersion=6, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=47581993, setName='no-prd-rep', canonicalAddress=mngd-new-pr1-01:27038, hosts=[mngd-new-pr1-01:27038, mngd-new-pr1-02:27038, mngd-new-pr1-03:27038], passives=[], arbiters=[], primary='mngd-new-pr1-01:27038',
2019-03-20 11:26:14.001 INFO 1 --- [38.200.19:27038] org.mongodb.driver.cluster : Server 38.200.19:27038 is no longer a member of the replica set. Removing from client view of cluster.
2019-03-20 11:26:14.001 INFO 1 --- [38.200.19:27038] org.mongodb.driver.cluster : Server 164.29.4:27038 is no longer a member of the replica set. Removing from client view of cluster.
2019-03-20 11:26:14.001 INFO 1 --- [38.200.19:27038] org.mongodb.driver.cluster : Canonical address mngd-new-pr1-01:27038 does not match server address. Removing .38.200.19:27038 from client view of cluster
2019-03-20 11:26:34.012 INFO 1 --- [2-prd2-01:27038] org.mongodb.driver.cluster : Exception in monitor thread while connecting to server mngd-new-pr1-01:27038
com.mongodb.MongoSocketException: mngd-new-pr1-01: Name or service not known
at com.mongodb.ServerAddress.getSocketAddress(http://ServerAddress.java:188 ) ~[mongodb-driver-core-3.6.4.jar!/:na]
So since we are using the endpoints as ip address and its not matching with the connection string specified in the deployment yaml connection string it might be failing .
Really confusing me a lot :)
PS : to check the connectivity to external mongo cluster i have launched the single pod
apiVersion: v1
kind: Pod
metadata:
name: proxy-chk
spec:
containers:
- name: centos
image: centos
command: ["/bin/sh", "-c", "while : ;do curl -L http://${MONGODBendpointipaddress}:27038/; sleep 10; done"]
In the logs i can see the it is able to establish the connectivity .
" It looks like you are trying to access MongoDB over HTTP on the native driver port. "
So i think the headless service which i created earlier its able to route the traffic
Need your advise .
One alternative could be create one service service each for mongo host but that defeat abstraction if you need to add more hosts in future.