Monitor that service to pod iptables mappings are current - kubernetes

The problem occurred on kubernetes 1.2.3 but we are running 1.3.3 now.
We have had 2 situations where kube-proxy was running but was wedged and not updating iptables with the current state of services to pods. This led to a situation where traffic destined for serviceA got routed to pods that are part of serviceB. So we have improved our monitoring after the fact to query /healthz on the kube-proxy. I'm wondering if I should be monitoring anything beyond the existence of the kube-proxy process and that it's returning 200 from /healthz.
Are you monitoring anything additional to ensure that service to pod mappings are current. I realize that as the service landscape is changing we can have a period of time where all hosts may not be accurate but i'm only interested in catching the scenario where say 3+ minutes have gone by and iptables is not current on every node in the cluster which would seem to indicate to me that something is broken somewhere.
I had thought about doing something like having a canary service where the backing deployment get's redeployed every 5 minutes and then i verify from each node that I can get to all of the backing pods via the service cluster ip.
I'm not sure if this is the right approach. It would seem like it could catch the problem we had earlier but I'm also thinking some other simpler way may exist like just checking the time stamp on when iptables was last updated?
Thanks!

You could run kube-proxy inside a pod (by dropping a manifest inside /etc/kubernetes/manifests on each node), benefit from the health checking / liveness probes offered by Kubernetes, and let it take care of restarting the service for you in case of trouble.
Setting a very low threshold on the liveness probe will trigger a restart as soon as the /healthz endpoint takes too long to respond. It won't guarantee you that IPtables rules are always up-to-date, but will ensure that the kube-proxy is always healthy (which in turn will ensure IPtables rules are consistent)
Example:
Check the healthz endpoint of kube-proxy every 10s. Restart the pod if it doesn't respond in less than 1s:
apiVersion: v1
kind: Pod
metadata:
name: kube-proxy
namespace: kube-system
spec:
hostNetwork: true
containers:
- name: kube-proxy
image: gcr.io/google_containers/hyperkube:v1.3.4
command:
- /hyperkube
- proxy
- --master=https://master.kubernetes.io:6443
- --kubeconfig=/conf/kubeconfig
- --proxy-mode=iptables
livenessProbe:
httpGet:
path: /healthz
port: 10249
timeoutSeconds: 1
periodSeconds: 10
failureThreshold: 1
securityContext:
privileged: true
volumeMounts:
- mountPath: /conf/kubeconfig
name: kubeconfig
readOnly: true
- mountPath: /ssl/kubernetes
name: ssl-certs-kubernetes
readOnly: true
- mountPath: /etc/ssl/certs
name: ssl-certs-host
readOnly: true
volumes:
- hostPath:
path: /etc/kubernetes/proxy-kubeconfig.yml
name: kubeconfig
- hostPath:
path: /etc/kubernetes/ssl
name: ssl-certs-kubernetes
- hostPath:
path: /usr/share/ca-certificates
name: ssl-certs-host

Related

Debugging slow Kubernetes deployment

We are using K8S in a managed Azure environment, Minikube in Ubuntu and a Rancher cluster built on on-prem machines and in general, our deployments take up to about 30 seconds to pull containers, run up and be ready. However, my latest attempt to create a deployment (on-prem) takes upwards of a minute and sometimes longer. It is a small web service which is very similar to our other deployments. The only (obvious) difference is the use of a startup probe and a liveness probe, although some of our other services do have probes, they are different though.
After removing Octopus deploy from the equation by extracting the yaml it was running and using kubectl, as soon as the (single) pod starts, I start reading the logs and as expected, the startup and liveness probes are called very quickly. Startup succeeds and the cluster starts calling the live probe, which also succeeds. However, if I use kubectl describe on the pod, it shows Initialized and PodScheduled as True but ContainersReady (there is one container) and Ready are both false for around a minute. I can't see what would cause this other than probe failures but these are logged as successful.
They eventually start and work OK but I don't know why they take so long.
kind: Deployment
apiVersion: apps/v1
metadata:
name: 'redirect-files-deployments-28775'
labels:
Octopus.Kubernetes.SelectionStrategyVersion: "SelectionStrategyVersion2"
OtherOctopusLabels
spec:
replicas: 1
selector:
matchLabels:
Octopus.Kubernetes.DeploymentName: 'redirect-files-deployments-28775'
template:
metadata:
labels:
Octopus.Kubernetes.SelectionStrategyVersion: "SelectionStrategyVersion2"
OtherOctopusLabels
spec:
containers:
- name: redirect-files
image: ourregistry.azurecr.io/microservices.redirectfiles:1.0.34
ports:
- name: http
containerPort: 80
protocol: TCP
env:
- removed connection strings etc
livenessProbe:
httpGet:
path: /api/version
port: 80
scheme: HTTP
successThreshold: 1
startupProbe:
httpGet:
path: /healthcheck
port: 80
scheme: HTTP
httpHeaders:
- name: X-SS-Authorisation
value: asdkjlkwe098sad0akkrweklkrew
initialDelaySeconds: 5
timeoutSeconds: 5
imagePullSecrets:
- name: octopus-feedcred-feeds-azure-container-registry
So the cause was the startup and/or liveness probes. When I removed them, the deployment time went from over a minute to 18 seconds, despite the logs proving that the probes were called successfully very quickly after containers were started.
At least I now have something more concrete to look for.

How can I ignore failure of a container in multi-container pod?

I have a multi-container application: app + sidecar. Both containers suppose to be alive all the time but sidecar is not really that important.
Sidecar depends on external resource, if this resource is not available - sidecar crashes. And it takes entire pod down. Kubernetes tries to recreate pod and fails because sidecar now won't start.
But from my business logic perspective - crash of sidecar is absolutely normal. Having that sidecar is nice but not mandatory.
I don't want sidecar to take main app with it when it crashes.
What would be best Kubernetes-native way to achieve that?
Is it possible to tell kubernetes ignore failure of sidecar as a "false positive" event which is absolutely fine?
I can't find anything in pod specification what controls that behaviour.
My yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: myapp
spec:
volumes:
- name: logs-dir
emptyDir: {}
containers:
- name: myapp
image: ${IMAGE}
ports:
- containerPort: 9009
volumeMounts:
- name: logs-dir
mountPath: /usr/src/app/logs
resources:
limits:
cpu: "1"
memory: "512Mi"
readinessProbe:
initialDelaySeconds: 60
failureThreshold: 8
timeoutSeconds: 1
periodSeconds: 8
httpGet:
scheme: HTTP
path: /myapp/v1/admin-service/git-info
port: 9009
- name: graylog-sidecar
image: digiapulssi/graylog-sidecar:latest
volumeMounts:
- name: logs-dir
mountPath: /log
env:
- name: GS_TAGS
value: "[\"myapp\"]"
- name: GS_NODE_ID
value: "nodeid"
- name: GS_SERVER_URL
value: "${GRAYLOG_URL}"
- name: GS_LIST_LOG_FILES
value: "[\"/ctwf\"]"
- name: GS_UPDATE_INTERVAL
value: "10"
resources:
limits:
memory: "128Mi"
cpu: "0.1"
Warning: the answer that was flagged as "correct" does not appear to work.
Adding a Liveness Probe to the application container and setting Restart Policy to "Never", will lead to the Pod being stopped and never restarted in a scenario where the sidecar container has stopped and the application container has failed its Liveness Probe. This is a problem, since you DO want the restarts for the application container.
The problem should be solved as follows:
Tweak your sidecar container in the startup command to keep the main process running on failure of the application process. This could be done with an extra piece of scripting, e.g. by appending | tail -f /dev/null to the startup command.
Adding a Liveness Probe to the application container is in general a good idea. Keep in mind though that it only protects you against a scenario where your application process keeps running without your application being in a correct state. It will certainly not overwrite the restartPolicy:
livenessProbe: Indicates whether the container is running. If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
Container Probes
A custom livenessProbe should help but for your scenario I would use the liveness for your main app container which is the myapp. Considering the fact that you don't care about the sidecare (as mentioned). I would set the pod restartPolicy to Never and then define a custom livelinessProbe for your main myapp. In this way the Pod will never restart doesn't matter which container is failed but when your myapp container's liveliness fails kubelet will restart the container! Ref below, link
Pod is running and has two Containers. Container 1 exits with failure.
Log failure event. If restartPolicy is: Always: Restart Container; Pod
phase stays Running. OnFailure: Restart Container; Pod phase stays
Running. Never: Do not restart Container; Pod phase stays Running.
so the updated (pseudo) yaml should look like below
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
...
spec:
...
restartPolicy: Never
containers:
- name: myapp
...
livenessProbe:
exec:
command:
- /bin/sh
- -c
- {{ your custom liveliness check command goes }}
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
...
- name: graylog-sidecar
...
Note: since I don't know your application therefore I cannot write the command but for my jboss server I use this (an example for you)
livenessProbe:
exec:
command:
- /bin/sh
- -c
- /opt/jboss/wildfly/bin/jboss-cli.sh --connect --commands="read-attribute
server-state"
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
The best solution which works for me is not to fail inside a sidecar container, but just log an error and rerun.
#!/usr/bin/env bash
set -e
# do some stuff which can fail on start
set +e # needed to not exit if command fails
while ! command; do
echo "command failed - rerun"
done
This will always rerun the command if it fails, but exit if the command finished successfully.
You can define a custom livenessProbe for your sidecar to have greater failureThreshold / periodSeconds to accommodate what is considered acceptable failure rate in your environment, or simply ignore all failure.
Docs:
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.15/#probe-v1-core
kubectl explain deployment.spec.template.spec.containers.livenessProbe

Is there a way to do a load balancing between pod in multiple nodes?

I have a kubernetes cluster deployed with rke witch is composed of 3 nodes in 3 different servers and in those server there is 1 pod which is running yatsukino/healthereum which is a personal modification of ethereum/client-go:stable .
The problem is that I'm not understanding how to add an external ip to send request to the pods witch are
My pods could be in 3 states:
they syncing the ethereum blockchain
they restarted because of a sync problem
they are sync and everything is fine
I don't want my load balancer to transfer requests to the 2 first states, only the third point consider my pod as up to date.
I've been searching in the kubernetes doc but (maybe because a miss understanding) I only find load balancing for pods inside a unique node.
Here is my deployment file:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: goerli
name: goerli-deploy
spec:
replicas: 3
selector:
matchLabels:
app: goerli
template:
metadata:
labels:
app: goerli
spec:
containers:
- image: yatsukino/healthereum
name: goerli-geth
args: ["--goerli", "--datadir", "/app", "--ipcpath", "/root/.ethereum/geth.ipc"]
env:
- name: LASTBLOCK
value: "0"
- name: FAILCOUNTER
value: "0"
ports:
- containerPort: 30303
name: geth
- containerPort: 8545
name: console
livenessProbe:
exec:
command:
- /bin/sh
- /app/health.sh
initialDelaySeconds: 20
periodSeconds: 60
volumeMounts:
- name: app
mountPath: /app
initContainers:
- name: healthcheck
image: ethereum/client-go:stable
command: ["/bin/sh", "-c", "wget -O /app/health.sh http://my-bash-script && chmod 544 /app/health.sh"]
volumeMounts:
- name: app
mountPath: "/app"
restartPolicy: Always
volumes:
- name: app
hostPath:
path: /app/
The answers above explains the concepts, but about your questions anout services and external ip; you must declare the service, example;
apiVersion: v1
kind: Service
metadata:
name: goerli
spec:
selector:
app: goerli
ports:
- port: 8545
type: LoadBalancer
The type: LoadBalancer will assign an external address for in public cloud or if you use something like metallb. Check your address with kubectl get svc goerli. If the external address is "pending" you have a problem...
If this is your own setup you can use externalIPs to assign your own external ip;
apiVersion: v1
kind: Service
metadata:
name: goerli
spec:
selector:
app: goerli
ports:
- port: 8545
externalIPs:
- 222.0.0.30
The externalIPs can be used from outside the cluster but you must route traffic to any node yourself, for example;
ip route add 222.0.0.30/32 \
nexthop via 192.168.0.1 \
nexthop via 192.168.0.2 \
nexthop via 192.168.0.3
Assuming yous k8s nodes have ip 192.168.0.x. This will setup ECMP routes to your nodes. When you make a request from outside the cluster to 222.0.0.30:8545 k8s will load-balance between your ready PODs.
For loadbalancing and exposing your pods, you can use https://kubernetes.io/docs/concepts/services-networking/service/
and for checking when a pod is ready, you can use tweak your liveness and readiness probes as explained https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
for probes you might want to consider exec actions like execution a script that checks what is required and returning 0 or 1 dependent on status.
When a container is started, Kubernetes can be configured to wait for a configurable
amount of time to pass before performing the first readiness check. After that, it
invokes the probe periodically and acts based on the result of the readiness probe. If a
pod reports that it’s not ready, it’s removed from the service. If the pod then becomes
ready again, it’s re-added.
Unlike liveness probes, if a container fails the readiness check, it won’t be killed or
restarted. This is an important distinction between liveness and readiness probes.
Liveness probes keep pods healthy by killing off unhealthy containers and replacing
them with new, healthy ones, whereas readiness probes make sure that only pods that
are ready to serve requests receive them. This is mostly necessary during container
start up, but it’s also useful after the container has been running for a while.
I think you can use probe for your goal

20 percent of requests timeout when a node crashing in k8s. How to solve this?

I was testing my kubernetes services recently. And I found it's very unreliable. Here are the situation:
1. The test service 'A' which receives HTTP requests at port 80 has five pods deployed on three nodes.
2. An nginx ingress was set to route traffic outside onto the service 'A'.
3. The ingress was set like this:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: test-A
annotations:
nginx.ingress.kubernetes.io/proxy-connect-timeout: "1s"
nginx.ingress.kubernetes.io/proxy-next-upstream: "error timeout invalid_header http_502 http_503 http_504"
nginx.ingress.kubernetes.io/proxy-next-upstream-tries: "2"
spec:
rules:
- host: <test-url>
http:
paths:
- path: /
backend:
serviceName: A
servicePort: 80
http_load was started on an client host and kept sending request to the ingress nginx at a speed of 1000 per-seconds. All the request were routed to the service 'A' in k8s and eveything goes well.
When I restarted one of the nodes manually, things went wrong:
In the next 3 minutes, about 20% requests were timeout, which is unacceptable in product environment.
I don't know why k8s reacts so slow and is there a way to solve this problem?
You can speed up that fail-over process by configuring liveness and readiness probes in the pods' spec:
Container probes
...
livenessProbe: Indicates whether the Container is running. If the liveness probe fails, the kubelet kills the Container, and the Container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
readinessProbe: Indicates whether the Container is ready to service requests. If the readiness probe fails, the endpoints controller removes the Pod’s IP address from the endpoints of all Services that match the Pod. The default state of readiness before the initial delay is Failure. If a Container does not provide a readiness probe, the default state is Success.
Liveness probe example:
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
Thanks for #VAS's answer!
Liveness probe is a way to solve this problem.
But I finally figured out that what I want was passive health check, which the k8s dosen't surpport.
And I solved this problem by introducing istio into my cluster.

Terminate a pod when container dies

I want to terminate a pod when container dies but I did not find a efficient way of doing it.
I can kill the pod using kubctl but I want pod should get killed/restart automatically whenever any container restarts in a pod.
Can this task be achieved using operator?
There's a way, you have to add livenessProbe configuration with restartPolicy never in your pod config.
The livenessProbe listen to container failures
When the container dies, as restartPolicy is never, pod status becomes Failed.
For example;
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-http
spec:
restartPolicy: Never
containers:
- args:
- /server
image: k8s.gcr.io/liveness
livenessProbe:
httpGet:
# when "host" is not defined, "PodIP" will be used
# host: my-host
# when "scheme" is not defined, "HTTP" scheme will be used. Only "HTTP" and "HTTPS" are allowed
# scheme: HTTPS
path: /healthz
port: 8080
httpHeaders:
- name: X-Custom-Header
value: Awesome
initialDelaySeconds: 15
timeoutSeconds: 1
name: liveness
Here's the reference; https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase
If it is not critical to terminate a pod immediately when one of the containers fails, you can think about setting a global timeout for the entire pod. This can be achieved by setting activeDeadlineSeconds as specified here for the v.1.17: https://v1-17.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#pod-v1-core