EKS, Windows node. networkPlugin cni failed - kubernetes

Per https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html, I ran the command, eksctl utils install-vpc-controllers --cluster <cluster_name> --approve
My EKS version is v1.16.3. I tries to deploy Windows docker images to a windows node. I got error below.
Warning FailedCreatePodSandBox 31s kubelet, ip-west-2.compute.internal Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ab8001f7b01f5c154867b7e" network for pod "mrestapi-67fb477548-v4njs": networkPlugin cni failed to set up pod "mrestapi-67fb477548-v4njs_ui" network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address
$ kubectl logs vpc-resource-controller-645d6696bc-s5rhk -n kube-system
I1010 03:40:29.041761 1 leaderelection.go:185] attempting to acquire leader lease kube-system/vpc-resource-controller...
I1010 03:40:46.453557 1 leaderelection.go:194] successfully acquired lease kube-system/vpc-resource-controller
W1010 23:57:53.972158 1 reflector.go:341] pkg/mod/k8s.io/client-go#v0.0.0-20180910083459-2cefa64ff137/tools/cache/reflector.go:99: watch of *v1.Pod ended with: too old resource version: 1480444 (1515040)
It complains too old resource version. How do I upgrade the version?

I removed the windows nodes, re-created windows nodes with different instance type. But, it did not work.
Removed windows nodes group, re-created windows nodes group. It did not work.
Finally, I removed entire EKS cluster, re-created eks cluster. The command, kubectl describe node <windows_node> gives me the output below.
vpc.amazonaws.com/CIDRBlock 0 0
vpc.amazonaws.com/ENI 0 0
vpc.amazonaws.com/PrivateIPv4Address 1 1
Deployed windows-server-iis.yaml. It works as expected. The root cause of the problem is mystery.

To troubleshoot this I would...
First list the components to make sure they're running:
$kubectl get pod -n kube-system | grep vpc
vpc-admission-webhook-deployment-7f67d7b49-wgzbg 1/1 Running 0 38h
vpc-resource-controller-595bfc9d98-4mb2g 1/1 Running 0 29
If they are running check their logs
kubectl logs <vpc-yadayada> -n kube-system
Make sure the instance type you are using has enough available IPs per ENI because in the Windows world only one ENI is used and is limited to the max available IP's per ENI minus one for the Primary IP address. I have run into this error before where I have exceeded the number of IP's available to my ENI.
Confirm that the selector of your pod is right
nodeSelector:
kubernetes.io/os: windows
kubernetes.io/arch: amd64
As an anecdote, I have done the steps mentioned under the To enable Windows support for your cluster with a macOS or Linux client section of the doc you linked on a few clusters to date, and they have worked well.

What is your output for
kubectl describe node <windows_node>
?
if it's like :
vpc.amazonaws.com/CIDRBlock: 0
vpc.amazonaws.com/ENI: 0
vpc.amazonaws.com/PrivateIPv4Address: 0
then you need to re-create the nodegroup with different instance type...
then try to deploy this :
apiVersion: apps/v1
kind: Deployment
metadata:
name: windows-server-iis-test
namespace: default
spec:
selector:
matchLabels:
app: windows-server-iis-test
tier: backend
track: stable
replicas: 1
template:
metadata:
labels:
app: windows-server-iis-test
tier: backend
track: stable
spec:
containers:
- name: windows-server-iis-test
image: mcr.microsoft.com/windows/servercore:1809
ports:
- name: http
containerPort: 80
imagePullPolicy: IfNotPresent
command:
- powershell.exe
- -command
- "Add-WindowsFeature Web-Server; Invoke-WebRequest -UseBasicParsing -Uri 'https://dotnetbinaries.blob.core.windows.net/servicemonitor/2.0.1.6/ServiceMonitor.exe' -OutFile 'C:\\ServiceMonitor.exe'; echo '<html><body><br/><br/><marquee><H1>Hello EKS!!!<H1><marquee></body><html>' > C:\\inetpub\\wwwroot\\default.html; C:\\ServiceMonitor.exe 'w3svc'; "
resources:
limits:
cpu: 256m
memory: 256Mi
requests:
cpu: 128m
memory: 100Mi
nodeSelector:
kubernetes.io/os: windows
---
apiVersion: v1
kind: Service
metadata:
name: windows-server-iis-test
namespace: default
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: windows-server-iis-test
tier: backend
track: stable
sessionAffinity: None
type: ClusterIP
kubectl proxy
open browser http://localhost:8001/api/v1/namespaces/default/services/http:windows-server-iis-test:80/proxy/default.html will shown webpage with Hello EKS text

Related

how to restrict a pod to connect only to 2 pods using networkpolicy and test connection in k8s in simple way?

Do I still need to expose pod via clusterip service?
There are 3 pods - main, front, api. I need to allow ingress+egress connection to main pod only from the pods- api and frontend. I also created service-main - service that exposes main pod on port:80.
I don't know how to test it, tried:
k exec main -it -- sh
netcan -z -v -w 5 service-main 80
and
k exec main -it -- sh
curl front:80
The main.yaml pod:
apiVersion: v1
kind: Pod
metadata:
labels:
app: main
item: c18
name: main
spec:
containers:
- image: busybox
name: main
command:
- /bin/sh
- -c
- sleep 1d
The front.yaml:
apiVersion: v1
kind: Pod
metadata:
labels:
app: front
name: front
spec:
containers:
- image: busybox
name: front
command:
- /bin/sh
- -c
- sleep 1d
The api.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
app: api
name: api
spec:
containers:
- image: busybox
name: api
command:
- /bin/sh
- -c
- sleep 1d
The main-to-front-networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: front-end-policy
spec:
podSelector:
matchLabels:
app: main
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: front
ports:
- port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: front
ports:
- port: 8080
What am I doing wrong? Do I still need to expose main pod via service? But should not network policy take care of this already?
Also, do I need to write containerPort:80 in main pod? How to test connectivity and ensure ingress-egress works only for main pod to api, front pods?
I tried the lab from ckad prep course, it had 2 pods: secure-pod and web-pod. There was issue with connectivity, the solution was to create network policy and test using netcat from inside the web-pod's container:
k exec web-pod -it -- sh
nc -z -v -w 1 secure-service 80
connection open
UPDATE: ideally I want answers to these:
a clear explanation of the diff btw service and networkpolicy.
If both service and netpol exist - what is the order of evaluation that the traffic/request goes thru? It first goes thru netpol then service? Or vice versa?
if I want front and api pods to send/receive traffic to main - do I need separate services exposing front and api pods?
Network policies and services are two different and independent Kubernetes resources.
Service is:
An abstract way to expose an application running on a set of Pods as a network service.
Good explanation from the Kubernetes docs:
Kubernetes Pods are created and destroyed to match the state of your cluster. Pods are nonpermanent resources. If you use a Deployment to run your app, it can create and destroy Pods dynamically.
Each Pod gets its own IP address, however in a Deployment, the set of Pods running in one moment in time could be different from the set of Pods running that application a moment later.
This leads to a problem: if some set of Pods (call them "backends") provides functionality to other Pods (call them "frontends") inside your cluster, how do the frontends find out and keep track of which IP address to connect to, so that the frontend can use the backend part of the workload?
Enter Services.
Also another good explanation in this answer.
For production you should use a workload resources instead of creating pods directly:
Pods are generally not created directly and are created using workload resources. See Working with Pods for more information on how Pods are used with workload resources.
Here are some examples of workload resources that manage one or more Pods:
Deployment
StatefulSet
DaemonSet
And use services to make requests to your application.
Network policies are used to control traffic flow:
If you want to control traffic flow at the IP address or port level (OSI layer 3 or 4), then you might consider using Kubernetes NetworkPolicies for particular applications in your cluster.
Network policies target pods, not services (an abstraction). Check this answer and this one.
Regarding your examples - your network policy is correct (as I tested it below). The problem is that your cluster may not be compatible:
For Network Policies to take effect, your cluster needs to run a network plugin which also enforces them. Project Calico or Cilium are plugins that do so. This is not the default when creating a cluster!
Test on kubeadm cluster with Calico plugin -> I created similar pods as you did, but I changed container part:
spec:
containers:
- name: main
image: nginx
command: ["/bin/sh","-c"]
args: ["sed -i 's/listen .*/listen 8080;/g' /etc/nginx/conf.d/default.conf && exec nginx -g 'daemon off;'"]
ports:
- containerPort: 8080
So NGINX app is available at the 8080 port.
Let's check pods IP:
user#shell:~$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
api 1/1 Running 0 48m 192.168.156.61 example-ubuntu-kubeadm-template-2 <none> <none>
front 1/1 Running 0 48m 192.168.156.56 example-ubuntu-kubeadm-template-2 <none> <none>
main 1/1 Running 0 48m 192.168.156.52 example-ubuntu-kubeadm-template-2 <none> <none>
Let's exec into running main pod and try to make request to the front pod:
root#main:/# curl 192.168.156.61:8080
<!DOCTYPE html>
...
<title>Welcome to nginx!</title>
It is working.
After applying your network policy:
user#shell:~$ kubectl apply -f main-to-front.yaml
networkpolicy.networking.k8s.io/front-end-policy created
user#shell:~$ kubectl exec -it main -- bash
root#main:/# curl 192.168.156.61:8080
...
Not working anymore, so it means that network policy is applied successfully.
Nice option to get more information about applied network policy is to run kubectl describe command:
user#shell:~$ kubectl describe networkpolicy front-end-policy
Name: front-end-policy
Namespace: default
Created on: 2022-01-26 15:17:58 +0000 UTC
Labels: <none>
Annotations: <none>
Spec:
PodSelector: app=main
Allowing ingress traffic:
To Port: 8080/TCP
From:
PodSelector: app=front
Allowing egress traffic:
To Port: 8080/TCP
To:
PodSelector: app=front
Policy Types: Ingress, Egress

kubernetes "unable to get metrics"

I am trying to autoscale a deployment and a statefulset, by running respectivly these two commands:
kubectl autoscale statefulset mysql --cpu-percent=50 --min=1 --max=10
kubectl expose deployment frontend --type=LoadBalancer --name=frontend
Sadly, on the minikube dashboard, this error appears under both services:
failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
Searching online I read that it might be a dns error, so I checked but CoreDNS seems to be running fine.
Both workloads are nothing special, this is the 'frontend' deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
labels:
app: frontend
spec:
replicas: 3
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: hubuser/repo
ports:
- containerPort: 3000
Has anyone got any suggestions?
First of all, could you please verify if the API is working fine? To do so, please run kubectl get --raw /apis/metrics.k8s.io/v1beta1.
If you get an error similar to:
“Error from server (NotFound):”
Please follow these steps:
1.- Remove all the proxy environment variables from the kube-apiserver manifest.
2.- In the kube-controller-manager-amd64, set --horizontal-pod-autoscaler-use-rest-clients=false
3.- The last scenario is that your metric-server add-on is disabled by default. You can verify it by using:
$ minikube addons list
If it is disabled, you will see something like metrics-server: disabled.
You can enable it by using:
$minikube addons enable metrics-server
When it is done, delete and recreate your HPA.
You can use the following thread as a reference.

Kubernetes clusterIP does not load balance requests [duplicate]

My Environment: Mac dev machine with latest Minikube/Docker
I built (locally) a simple docker image with a simple Django REST API "hello world".I'm running a deployment with 3 replicas. This is my yaml file for defining it:
apiVersion: v1
kind: Service
metadata:
name: myproj-app-service
labels:
app: myproj-be
spec:
type: LoadBalancer
ports:
- port: 8000
selector:
app: myproj-be
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myproj-app-deployment
labels:
app: myproj-be
spec:
replicas: 3
selector:
matchLabels:
app: myproj-be
template:
metadata:
labels:
app: myproj-be
spec:
containers:
- name: myproj-app-server
image: myproj-app-server:4
ports:
- containerPort: 8000
env:
- name: DATABASE_URL
value: postgres://myname:#10.0.2.2:5432/myproj2
- name: REDIS_URL
value: redis://10.0.2.2:6379/1
When I apply this yaml it generates things correctly.
- one deployment
- one service
- three pods
Deployments:
NAME READY UP-TO-DATE AVAILABLE AGE
myproj-app-deployment 3/3 3 3 79m
Services:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 83m
myproj-app-service LoadBalancer 10.96.91.44 <pending> 8000:31559/TCP 79m
Pods:
NAME READY STATUS RESTARTS AGE
myproj-app-deployment-77664b5557-97wkx 1/1 Running 0 48m
myproj-app-deployment-77664b5557-ks7kf 1/1 Running 0 49m
myproj-app-deployment-77664b5557-v9889 1/1 Running 0 49m
The interesting thing is that when I SSH into the Minikube, and hit the service using curl 10.96.91.44:8000 it respects the LoadBalancer type of the service and rotates between all three pods as I hit the endpoints time and again. I can see that in the returned results which I have made sure to include the HOSTNAME of the pod.
However, when I try to access the service from my Hosting Mac -- using kubectl port-forward service/myproj-app-service 8000:8000 -- Every time I hit the endpoint, I get the same pod to respond. It doesn't load balance. I can see that clearly when I kubectl logs -f <pod> to all three pods and only one of them is handling the hits, as the other two are idle...
Is this a kubectl port-forward limitation or issue? or am I missing something greater here?
kubectl port-forward looks up the first Pod from the Service information provided on the command line and forwards directly to a Pod rather than forwarding to the ClusterIP/Service port. The cluster doesn't get a chance to load balance the service like regular service traffic.
The kubernetes API only provides Pod port forward operations (CREATE and GET). Similar API operations don't exist for Service endpoints.
kubectl code
Here's a little bit of the flow from the kubectl code that seems to back that up (I'll just add that Go isn't my primary language)
The portforward.go Complete function is where kubectl portforward does the first look up for a pod from options via AttachablePodForObjectFn:
The AttachablePodForObjectFn is defined as attachablePodForObject in this interface, then here is the attachablePodForObject function.
To my (inexperienced) Go eyes, it appears the attachablePodForObject is the thing kubectl uses to look up a Pod to from a Service defined on the command line.
Then from there on everything deals with filling in the Pod specific PortForwardOptions (which doesn't include a service) and is passed to the kubernetes API.
The reason was that my pods were randomly in a crashing state due to Python *.pyc files that were left in the container. This causes issues when Django is running in a multi-pod Kubernetes deployment. Once I removed this issue and all pods ran successfully, the round-robin started working.

k0s kubectl exec and kubectl port-forwarding are broken

I have a simple nginx pod and a k0s cluster setup with the k0s binary. Now i want to connect to that pod, but i get this error:
$ kubectl port-forward frontend-deployment-786ddcb47-p5kkv 7000:80
error: error upgrading connection: error dialing backend: rpc error: code = Unavailable
desc = connection error: desc = "transport: Error while dialing dial unix /var/lib/k0s/run/konnectivity-server/konnectivity-server.sock: connect: connection refused"
I dont understand why this happens and why it is tries to access /var/lib/k0s/run/konnectivity-server/konnectivity-server.sock which does not exist on my maschine.
Do I have to add my local dev maschine with k0s to the cluster?
Extract from pod describe:
Containers:
frontend:
Container ID: containerd://897a8911cd31c6d58aef4b22da19dc8166cb7de713a7838bc1e486e497e9f1b2
Image: nginx:1.16
Image ID: docker.io/library/nginx#sha256:d20aa6d1cae56fd17cd458f4807e0de462caf2336f0b70b5eeb69fcaaf30dd9c
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Thu, 28 Jan 2021 14:20:58 +0100
Ready: True
Restart Count: 0
Environment: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m43s default-scheduler Successfully assigned remove-me/frontend-deployment-786ddcb47-p5kkv to k0s-worker-2
Normal Pulling 3m42s kubelet Pulling image "nginx:1.16"
Normal Pulled 3m33s kubelet Successfully pulled image "nginx:1.16" in 9.702313183s
Normal Created 3m32s kubelet Created container frontend
Normal Started 3m32s kubelet Started container frontend
deployment.yml and service.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend-deployment
labels:
app: frontend
spec:
replicas: 2
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: nginx:1.16
ports:
- containerPort: 80
----
apiVersion: v1
kind: Service
metadata:
name: frontend-service
spec:
selector:
app: frontend
ports:
- protocol: TCP
port: 80
targetPort: 80
Workaround is to just remove the file.
/var/lib/k0s/run/konnectivity-server/konnectivity-server.sock and restart the server.
Currenlty my github issue is still open.
https://github.com/k0sproject/k0s/issues/665
I had a similar issue where /var/lib/k0s/run/konnectivity-server/konnectivity-server.sock was not getting created.(in my case path was /run/k0s/konnectivity-server/konnectivity-server.sock )
I changed my host configuration which finally fixed this issue.
Hostname: The hostnames in my nodes were in uppercase but k0s somehow expected it to be in lower case. We can override the hostname in the configuration file but that still did not fix the konnectivity sock issue so I had to reset all my hostnames on nodes to small case hostnames.
This change finally fixed my issue and on my first attempt, I saw the .sock file was getting created but still didn't have the permission. Then I followed the suggestion given above by TecBeast and that fixed the issue permanently.
This issue was then raised as an Issue in the K0s GitHub repository, and fixed in a Pull Request.

Is there a way to do a load balancing between pod in multiple nodes?

I have a kubernetes cluster deployed with rke witch is composed of 3 nodes in 3 different servers and in those server there is 1 pod which is running yatsukino/healthereum which is a personal modification of ethereum/client-go:stable .
The problem is that I'm not understanding how to add an external ip to send request to the pods witch are
My pods could be in 3 states:
they syncing the ethereum blockchain
they restarted because of a sync problem
they are sync and everything is fine
I don't want my load balancer to transfer requests to the 2 first states, only the third point consider my pod as up to date.
I've been searching in the kubernetes doc but (maybe because a miss understanding) I only find load balancing for pods inside a unique node.
Here is my deployment file:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: goerli
name: goerli-deploy
spec:
replicas: 3
selector:
matchLabels:
app: goerli
template:
metadata:
labels:
app: goerli
spec:
containers:
- image: yatsukino/healthereum
name: goerli-geth
args: ["--goerli", "--datadir", "/app", "--ipcpath", "/root/.ethereum/geth.ipc"]
env:
- name: LASTBLOCK
value: "0"
- name: FAILCOUNTER
value: "0"
ports:
- containerPort: 30303
name: geth
- containerPort: 8545
name: console
livenessProbe:
exec:
command:
- /bin/sh
- /app/health.sh
initialDelaySeconds: 20
periodSeconds: 60
volumeMounts:
- name: app
mountPath: /app
initContainers:
- name: healthcheck
image: ethereum/client-go:stable
command: ["/bin/sh", "-c", "wget -O /app/health.sh http://my-bash-script && chmod 544 /app/health.sh"]
volumeMounts:
- name: app
mountPath: "/app"
restartPolicy: Always
volumes:
- name: app
hostPath:
path: /app/
The answers above explains the concepts, but about your questions anout services and external ip; you must declare the service, example;
apiVersion: v1
kind: Service
metadata:
name: goerli
spec:
selector:
app: goerli
ports:
- port: 8545
type: LoadBalancer
The type: LoadBalancer will assign an external address for in public cloud or if you use something like metallb. Check your address with kubectl get svc goerli. If the external address is "pending" you have a problem...
If this is your own setup you can use externalIPs to assign your own external ip;
apiVersion: v1
kind: Service
metadata:
name: goerli
spec:
selector:
app: goerli
ports:
- port: 8545
externalIPs:
- 222.0.0.30
The externalIPs can be used from outside the cluster but you must route traffic to any node yourself, for example;
ip route add 222.0.0.30/32 \
nexthop via 192.168.0.1 \
nexthop via 192.168.0.2 \
nexthop via 192.168.0.3
Assuming yous k8s nodes have ip 192.168.0.x. This will setup ECMP routes to your nodes. When you make a request from outside the cluster to 222.0.0.30:8545 k8s will load-balance between your ready PODs.
For loadbalancing and exposing your pods, you can use https://kubernetes.io/docs/concepts/services-networking/service/
and for checking when a pod is ready, you can use tweak your liveness and readiness probes as explained https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
for probes you might want to consider exec actions like execution a script that checks what is required and returning 0 or 1 dependent on status.
When a container is started, Kubernetes can be configured to wait for a configurable
amount of time to pass before performing the first readiness check. After that, it
invokes the probe periodically and acts based on the result of the readiness probe. If a
pod reports that it’s not ready, it’s removed from the service. If the pod then becomes
ready again, it’s re-added.
Unlike liveness probes, if a container fails the readiness check, it won’t be killed or
restarted. This is an important distinction between liveness and readiness probes.
Liveness probes keep pods healthy by killing off unhealthy containers and replacing
them with new, healthy ones, whereas readiness probes make sure that only pods that
are ready to serve requests receive them. This is mostly necessary during container
start up, but it’s also useful after the container has been running for a while.
I think you can use probe for your goal