I have a Redis master slave cluster (1 master 2 slave) in a K8 cluster, redis cluster is handled by sentinel as a StatefulSet.
But after some time our redis connection cluster breaks between master and slave with below error.
Error condition on socket for SYNC: Connection reset by peer
below is the error in sentinel same time
1:X 23 May 2022 07:23:42.330 # +sdown sentinel 443057a099717ff138d37eb016e608fa5ecffd36 172.20.129.98 26379 # mymaster 172.20.129.98 6379
1:X 23 May 2022 07:23:42.531 # +sdown master mymaster 172.20.129.98 6379
1:X 23 May 2022 07:23:42.607 # +odown master mymaster 172.20.129.98 6379 #quorum 2/2
and in most cases it come backs online again without any restart in pod and container,
but in some cases slave pods are terminated.
there is no changes in any infra metrics(Cpu,Mem)
what could be the issue.
Related
I am trying to setup the kube cluster using Oracle VM Virtual Box. The command kubeadm is failing to start the cluster.
It waits on below:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
Then fails because of below:
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
OS: Ubuntu 16.04-xenial Docker version: 18.09.7 Kube version:
Kubernetes v1.23.5 Cluster type: Flannel
OS: Ubuntu 16.04-xenial Docker version: 20.10.7 Kube version:
Kubernetes v1.23.5 Cluster type: Calico
What I tried so far, with help of Google:
turn off swap - which was already done
combinations of kube-docker as above
restarting kubelet service
other bits I do not remember.
ensured that the static ips have been allocated, and other
prerequisites.
Can anyone assist? I am new to Kube.
I am trying to have 1 redis master with 2 redis replicas tied to a 3 Quorum Sentinel on Kubernetes. I am very new to Kubernetes.
My initial plan was to have the master running on a pod tied to 1 Kubernetes SVC and the 2 replicas running on their own pods tied to another Kubernetes SVC. Finally, the 3 Sentinel pods will be tied to their own SVC. The replicas will be tied to the master SVC (because without svc, ip will change). The sentinel will also be configured and tied to master and replica SVCs. But I'm not sure if this is feasible because when master pod crashes, how will one of the replica pods move to the master SVC and become the master? Is that possible?
The second approach I had was to wrap redis pods in a replication controller and the same for sentinel as well. However, I'm not sure how to make one of the pods master and the others replicas with a replication controller.
Would any of the two approaches work? If not, is there a better design that I can adopt? Any leads would be appreciated.
You can deploy Redis Sentinel using the Helm package manager and the Redis Helm Chart.
If you don't have Helm3 installed yet, you can use this documentation to install it.
I will provide a few explanations to illustrate how it works.
First we need to get the values.yaml file from the Redis Helm Chart to customize our installation:
$ wget https://raw.githubusercontent.com/bitnami/charts/master/bitnami/redis/values.yaml
We can configure a lot of parameters in the values.yaml file , but for demonstration purposes I only enabled Sentinel and set the redis password:
NOTE: For a list of parameters that can be configured during installation, see the Redis Helm Chart Parameters documentation.
# values.yaml
global:
redis:
password: redispassword
...
replica:
replicaCount: 3
...
sentinel:
enabled: true
...
Then we can deploy Redis using the configuration from the values.yaml file:
NOTE: It will deploy a three Pod cluster (one master and two slaves) managed by the StatefulSets with a sentinel container running inside each Pod.
$ helm install redis-sentinel bitnami/redis --values values.yaml
Be sure to carefully read the NOTES section of the chart installation output. It contains many useful information (e.g. how to connect to your database from outside the cluster)
After installation, check redis StatefulSet, Pods and Services (headless service can be used for internal access):
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP
redis-sentinel-node-0 2/2 Running 0 2m13s 10.4.2.21
redis-sentinel-node-1 2/2 Running 0 86s 10.4.0.10
redis-sentinel-node-2 2/2 Running 0 47s 10.4.1.10
$ kubectl get sts
NAME READY AGE
redis-sentinel-node 3/3 2m41s
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
redis-sentinel ClusterIP 10.8.15.252 <none> 6379/TCP,26379/TCP 2m
redis-sentinel-headless ClusterIP None <none> 6379/TCP,26379/TCP 2m
As you can see, each redis-sentinel-node Pod contains the redis and sentinel containers:
$ kubectl get pods redis-sentinel-node-0 -o jsonpath={.spec.containers[*].name}
redis sentinel
We can check the sentinel container logs to find out which redis-sentinel-node is the master:
$ kubectl logs -f redis-sentinel-node-0 sentinel
...
1:X 09 Jun 2021 09:52:01.017 # Configuration loaded
1:X 09 Jun 2021 09:52:01.019 * monotonic clock: POSIX clock_gettime
1:X 09 Jun 2021 09:52:01.019 * Running mode=sentinel, port=26379.
1:X 09 Jun 2021 09:52:01.026 # Sentinel ID is 1bad9439401e44e749e2bf5868ad9ec7787e914e
1:X 09 Jun 2021 09:52:01.026 # +monitor master mymaster 10.4.2.21 6379 quorum 2
...
1:X 09 Jun 2021 09:53:21.429 * +slave slave 10.4.0.10:6379 10.4.0.10 6379 # mymaster 10.4.2.21 6379
1:X 09 Jun 2021 09:53:21.435 * +slave slave 10.4.1.10:6379 10.4.1.10 6379 # mymaster 10.4.2.21 6379
...
As you can see from the logs above, the redis-sentinel-node-0 Pod is the master and the redis-sentinel-node-1 & redis-sentinel-node-2 Pods are slaves.
For testing, let's delete the master and check if sentinel will switch the master role to one of the slaves:
$ kubectl delete pod redis-sentinel-node-0
pod "redis-sentinel-node-0" deleted
$ kubectl logs -f redis-sentinel-node-1 sentinel
...
1:X 09 Jun 2021 09:55:20.902 # Executing user requested FAILOVER of 'mymaster'
...
1:X 09 Jun 2021 09:55:22.666 # +switch-master mymaster 10.4.2.21 6379 10.4.1.10 6379
...
1:X 09 Jun 2021 09:55:50.626 * +slave slave 10.4.0.10:6379 10.4.0.10 6379 # mymaster 10.4.1.10 6379
1:X 09 Jun 2021 09:55:50.632 * +slave slave 10.4.2.22:6379 10.4.2.22 6379 # mymaster 10.4.1.10 6379
A new master (redis-sentinel-node-2 10.4.1.10) has been selected, so everything works as expected.
Additionally, we can display more information by connecting to one of the Redis nodes:
$ kubectl run --namespace default redis-client --restart='Never' --env REDIS_PASSWORD=redispassword --image docker.io/bitnami/redis:6.2.1-debian-10-r47 --command -- sleep infinity
pod/redis-client created
$ kubectl exec --tty -i redis-client --namespace default -- bash
I have no name!#redis-client:/$ redis-cli -h redis-sentinel-node-1.redis-sentinel-headless -p 6379 -a $REDIS_PASSWORD
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
redis-sentinel-node-1.redis-sentinel-headless:6379> info replication
# Replication
role:slave
master_host:10.4.1.10
master_port:6379
master_link_status:up
...
Scenario:
One of the worker nodes goes down due to power cycle while the master is scheduling the pods between worker nodes.
Once the worker node comes up after power cycle, the master is able to schedule the remaining pods to worker node which came up.
However all the pods which are scheduled to the worker node are stuck in the "ContainerCreating" state for a long time which makes the worker node useless after the power cycle.
Cluster Details:
Docker Version: 18.06.1-ce
Kubernetes version: v1.14.0
helm version - v2.12.1
Host OS: Centos 7
Cloud being used: (put bare-metal if not on a public cloud)
Installation method: Ansible Script
Kubelet log:
Line 322: Jul 26 15:44:57 k8sworker3 kubelet[1832]: E0726 15:44:57.842527 1832 cni.go:331] Error adding logging_filebeat-kvdjg/acb6582a56c6d77fdb3364a1e0ab1dd47d31f63c64eb197903dc82007be4c7df to network weave-net/weave: unable to allocate IP address: Post http://127.0.0.1:6784/ip/acb6582a56c6d77fdb3364a1e0ab1dd47d31f63c64eb197903dc82007be4c7df: dial tcp 127.0.0.1:6784: connect: connection refused
Line 326: Jul 26 15:44:57 k8sworker3 kubelet[1832]: weave-cni: unable to release IP address: Delete http://127.0.0.1:6784/ip/acb6582a56c6d77fdb3364a1e0ab1dd47d31f63c64eb197903dc82007be4c7df: dial tcp 127.0.0.1:6784: connect: connection refused
Line 342: Jul 26 15:44:58 k8sworker3 kubelet[1832]: E0726 15:44:58.073865 1832 cni.go:331] Error adding vz1-db-backup_vz1-warrior-job-5d242b94c6ba2500011bfedc-1564172937569-pwpq2/a991d0c781d5c3ec6c2dca9753fc8a1a2958b762a75b3d619f3da3744c41d160 to network weave-net/weave: unable to allocate IP address: Post http://127.0.0.1:6784/ip/a991d0c781d5c3ec6c2dca9753fc8a1a2958b762a75b3d619f3da3744c41d160: dial tcp 127.0.0.1:6784: connect: connection refused
Line 349: Jul 26 15:44:58 k8sworker3 kubelet[1832]: E0726 15:44:58.093351 1832 remote_runtime.go:109] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to set up sandbox container "acb6582a56c6d77fdb3364a1e0ab1dd47d31f63c64eb197903dc82007be4c7df" network for pod "filebeat-kvdjg": NetworkPlugin cni failed to set up pod "filebeat-kvdjg_logging" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/acb6582a56c6d77fdb3364a1e0ab1dd47d31f63c64eb197903dc82007be4c7df: dial tcp 127.0.0.1:6784: connect: connection refused
Please suggest me on how to prevent this issue.
I am running Kubernetes cluster which is configured with a master and 3 nodes.
#kubectl get nodes
NAME STATUS AGE
minion-1 Ready 46d
minion-2 Ready 46d
minion-3 Ready 46d
I have launched couple of pods in the cluster and found that the pods are in pending state.
# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
httpd 0/1 Pending 0 10m <none>
nginx 0/1 Pending 0 11m <none>
One of the pod "httpd" yaml file:
# cat http.yaml
apiVersion: v1
kind: Pod
metadata:
name: httpd
labels:
env: test
spec:
containers:
- name: httpd
image: httpd
While debugging the reason for failure found that the couple of nodes configured are not ready. Only one node is reachable from master.
# ping minion-1
PING minion-1 (172.31.24.204) 56(84) bytes of data.
64 bytes from minion-1 (172.31.24.204): icmp_seq=1 ttl=64 time=0.575 ms
Whereas other nodes are not reachable:
# ping minion-2
PING minion-2 (172.31.29.95) 56(84) bytes of data.
From master (172.31.16.204) icmp_seq=1 Destination Host Unreachable
# ping minion-3
PING minion-3 (172.31.17.252) 56(84) bytes of data.
From master (172.31.16.204) icmp_seq=1 Destination Host Unreachable
The queries that I have here is
1) Why does Kubernetes showing the nodes as ready even if they are not
reachable from master?
2) Why are the pods creation failing?
Is it because of unavailability of nodes or any configuration issue in yaml file?
# kubectl describe pod httpd
Name: httpd
Namespace: default
Node: /
Labels: env=test
Status: Pending
IP:
Controllers: <none>
Containers:
httpd:
Image: httpd
Port:
Volume Mounts: <none>
Environment Variables: <none>
No volumes.
QoS Class: BestEffort
Tolerations: <none>
No events.
Following are the Kubernetes and etcd versions.
]# kubectl --version
Kubernetes v1.5.2
[root#raghavendar1 ~]# et
etcd etcdctl ether-wake ethtool
[root#raghavendar1 ~]# etcd --version
etcd Version: 3.2.5
Git SHA: d0d1a87
Go Version: go1.8.3
Go OS/Arch: linux/amd64
Kubernetes do not use ICMP protocol to check nodes master node connectivity.
Nodes become Ready when the communication node -> api-server works and this is done via https protocol.
You can read more about about node - master connectivity in kubernetes documentation https://kubernetes.io/docs/concepts/architecture/master-node-communication/
Why pod isn't scheduled?
The answer to this question is in the master logs probably, check kube-apiserver.log, kube-scheduler.log. The reason is cluster misconfiguration.
For start run it in a single network to get a grip of things and double check routing.
I'm using a Kubernetes cluster, deployed on Azure using the ACS-engine. My cluster is composed of 5 nodes.
1 master (unix VM) (v1.6.2)
2 unix agent (v1.6.2)
2 windows agent (v1.6.0-alpha.1.2959+451473d43a2072)
I have created a unix pod defined by the following YAML:
Name: ping-with-unix
Node: k8s-linuxpool1-25103419-0/10.240.0.5
Start Time: Fri, 30 Jun 2017 14:27:28 +0200
Status: Running
IP: 10.244.2.6
Controllers: <none>
Containers:
ping-with-unix-2:
Container ID:
Image: willfarrell/ping
Port:
State: Running
Started: Fri, 30 Jun 2017 14:27:29 +0200
Ready: True
Restart Count: 0
Environment:
HOSTNAME: google.com
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1nrh5 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-1nrh5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-1nrh5
Optional: false
QoS Class: BestEffort
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: <none>
Events: <none>
This pod does not have internet access.
2017-06-30T12:27:29.512885000Z ping google.com every 300 sec
2017-06-30T12:27:29.521968000Z PING google.com (172.217.17.78): 56 data bytes
2017-06-30T12:27:39.698081000Z --- google.com ping statistics ---
2017-06-30T12:27:39.698305000Z 1 packets transmitted, 0 packets received, 100% packet loss
I created a 2nd pod, targeting windows container, with a custom docker image. This image instantiates an HttpClient and request an endpoint. It also does not have internet access. I can access the container to run interactive PowerShell. I cannot not access any DNS (due to lack of internet access).
PS C:\app> ping github.com
Ping request could not find host github.com. Please check the name and try again.
PS C:\app> ping 192.30.253.112
Pinging 192.30.253.112 with 32 bytes of data:
Request timed out.
Request timed out.
Ping statistics for 192.30.253.112:
Packets: Sent = 2, Received = 0, Lost = 2 (100% loss),
Control-C
What do I have to configure to allow my containers to access internet?
Remarks: I have not defined any Network policy.
I've updated my cluster using the api version '2017-07-01' and the kubernetes version '1.6.6'. Both my unix and windows pods have Internet access.
Note, for Windows pods :
Internet is available 2 or 3 minutes after the pod starts
I can't set the DnsPolicy to "Default", only "ClusterFirst" or "ClusterFirstWithHostNet" works.