Kubernetes Autoscaling not recognizing Heapster - kubernetes

I have a kubernetes cluster that I'm trying to build from scratch without using their build scripts. Pretty much everything is working except for autoscaling. For some reason the control-manager can't find or doesn't know heapster is running.
I have a ticket open but it seems no responses
https://github.com/kubernetes/kubernetes/issues/18652
UPDATE: the issue has been answered on Github
Things I have setup.
Here is a list of all the pods currently
[root#kube-master test] [dev] # kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default my-nginx-8kmlz 1/1 Running 0 11h
default my-nginx-z8cxb 1/1 Running 0 11h
kube-system heapster-v10-vdc1v 3/3 Running 0 11h
kube-system kube-apiserver-10.122.0.20 1/1 Running 0 4d
kube-system kube-controller-manager-10.122.0.20 1/1 Running 1 9h
kube-system kube-dns-6iw3a 4/4 Running 0 4d
kube-system kube-proxy-10.122.0.20 1/1 Running 0 3d
kube-system kube-proxy-10.122.42.163 1/1 Running 0 4d
kube-system kube-proxy-10.122.43.138 1/1 Running 1 4d
kube-system kube-scheduler-10.122.0.20 1/1 Running 1 4d
So heapster is running against my proxy and I can access
http://10.122.0.20:8080/api/v1/proxy/namespaces/kube-system/services/heapster/api/v1/model/namespaces/default/pods/my-nginx-8kmlz/stats
It returns stats about the pod.
I'm really not sure what i am missing.
Here is what the output of a autoscale looks like
[root#kube-master test] [dev] # kubectl get hpa
NAME REFERENCE TARGET CURRENT MINPODS MAXPODS AGE
my-nginx ReplicationController/my-nginx/scale 80% <waiting> 1 5 22h
In my controller logs the only thing I really see is
W1224 18:27:43.425126 1 horizontal.go:185] Failed to reconcile my-nginx: failed to compute desired number of replicas based on CPU utilization for ReplicationController/default/my-nginx: failed to get cpu utilization: failed to get CPU consumption and request: some pods do not have request for cpu

you need to assign cpu request / limit for pods in deployment file example
resources:
requests:
cpu: "100m"
limits:
cpu: "250m"

Some time this happen because resource metrics not enabled.
you can verify with blow command:
kubectl top pod -n <namespace>
if you are getting pods then metrics is enabled:
https://kubernetes.io/docs/tasks/debug-application-cluster/resource-usage-monitoring/#resource-metrics-pipeline

Related

How are pods in kube-system namespace managed?

I'm trying to understand how kubernetes works, so I tried to do this operation for my minikube:
~ kubectl delete pod --all -n kube-system
pod "coredns-f9fd979d6-5n4b6" deleted
pod "etcd-minikube" deleted
pod "kube-apiserver-minikube" deleted
pod "kube-controller-manager-minikube" deleted
pod "kube-proxy-879lg" deleted
pod "kube-scheduler-minikube" deleted
It's okay. Pods deleted as wish. But if I do kubectl get pods -n kube-system I will see:
NAME READY STATUS RESTARTS AGE
coredns-f9fd979d6-5d25r 1/1 Running 0 50s
etcd-minikube 1/1 Running 0 50s
kube-apiserver-minikube 1/1 Running 0 50s
kube-controller-manager-minikube 1/1 Running 0 50s
kube-proxy-nlw69 1/1 Running 0 43s
kube-scheduler-minikube 1/1 Running 0 49s
Okay. I thought it's ReplicaSet or DaemonSet:
➜ ~ kubectl get ds -n kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-proxy 1 1 1 1 1 kubernetes.io/os=linux 18m
➜ ~ kubectl get rs -n kube-system
NAME DESIRED CURRENT READY AGE
coredns-f9fd979d6 1 1 1 18m
It is true for coredns and kube-proxy. But what about others (apiserver, etcd, controller and scheduler)? Why are they still alive?
The control plane pods are run as static Pods - static Pods are not managed by the control plane controllers like e.g. DaemonSet and ReplicaSet. Static pods are instead managed by the Kubelet daemon on the local node directly.

Kubernetes coredns pods stuck in Pending status. Cannot start the dashboard

I am building a Kubernetes cluster following this tutorial, and I have troubles to access the Kubernetes dashboard. I already created another question about it that you can see here, but while digging up into my cluster, I think that the problem might be somewhere else and that's why I create a new question.
I start my master, by running the following commands:
> kubeadm reset
> kubeadm init --apiserver-advertise-address=[MASTER_IP] > file.txt
> tail -2 file.txt > join.sh # I keep this file for later
> kubectl apply -f https://git.io/weave-kube/
> kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 2m46s
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 2m46s
etcd-kubemaster 1/1 Running 0 93s
kube-apiserver-kubemaster 1/1 Running 0 93s
kube-controller-manager-kubemaster 1/1 Running 0 113s
kube-proxy-lxhvs 1/1 Running 0 2m46s
kube-scheduler-kubemaster 1/1 Running 0 93s
Here we can see that I have two coredns pods stuck in Pending state forever, and when I run the command :
> kubectl -n kube-system describe pod coredns-fb8b8dccf-kb2zq
I can see in the Events part the following Warning :
Failed Scheduling : 0/1 nodes are available 1 node(s) had taints that the pod didn't tolerate.
Since it is a Warning and not and Error, and that as a Kubernetes newbie, taints does not mean much to me, I tried to connect a node to the master (using the previously saved command) :
> cat join.sh
kubeadm join [MASTER_IP]:6443 --token [TOKEN] \
--discovery-token-ca-cert-hash sha256:[ANOTHER_TOKEN]
> ssh [USER]#[WORKER_IP] 'bash' < join.sh
This node has joined the cluster.
On the master, I check that the node is connected:
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubemaster NotReady master 13m v1.14.1
kubeslave1 NotReady <none> 31s v1.14.1
And I check my pods :
> kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 14m
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 14m
etcd-kubemaster 1/1 Running 0 13m
kube-apiserver-kubemaster 1/1 Running 0 13m
kube-controller-manager-kubemaster 1/1 Running 0 13m
kube-proxy-lxhvs 1/1 Running 0 14m
kube-proxy-xllx4 0/1 ContainerCreating 0 2m16s
kube-scheduler-kubemaster 1/1 Running 0 13m
We can see that another kube-proxy pod have been created and is stuck in ContainerCreating status.
And when I am doing a describe again :
kubectl -n kube-system describe pod kube-proxy-xllx4
I can see in the Events part multiple identical Warnings :
Failed create pod sandbox : rpx error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.1": Get https://k8s.gcr.io/v1/_ping: dial tcp: lookup k8s.gcr.io on [::1]:53 read up [::1]43133->[::1]:53: read: connection refused
Here are my repositories :
docker image ls
REPOSITORY TAG
k8s.gcr.io/kube-proxy v1.14.1
k8s.gcr.io/kube-apiserver v1.14.1
k8s.gcr.io/kube-controller-manager v1.14.1
k8s.gcr.io/kube-scheduler v1.14.1
k8s.gcr.io/coredns 1.3.1
k8s.gcr.io/etcd 3.3.10
k8s.gcr.io/pause 3.1
And so, for the dashboard part, I tried to start it with the command
> kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/recommended/kubernetes-dashboard.yaml
But the dashboard pod is stuck in Pending state.
kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 40m
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 40m
etcd-kubemaster 1/1 Running 0 38m
kube-apiserver-kubemaster 1/1 Running 0 38m
kube-controller-manager-kubemaster 1/1 Running 0 39m
kube-proxy-lxhvs 1/1 Running 0 40m
kube-proxy-xllx4 0/1 ContainerCreating 0 27m
kube-scheduler-kubemaster 1/1 Running 0 38m
kubernetes-dashboard-5f7b999d65-qn8qn 1/1 Pending 0 8s
So, event though my problem originaly was that I cannot access to my dashboard, I guess that the real problem is deeper thant that.
I know that I just put a lot of information here, but I am a k8s beginner and I am completely lost on this.
There is an issue I experienced with coredns pods stuck in a pending mode when setting up your own cluster; which I resolve by adding pod network.
Looks like because there is no Network Addon installed, the nodes are taint as not-ready. Installing the Addon would remove the taints and the Pods will be able to schedule. In my case adding flannel fixed the issue.
EDIT: There is a note about this in the official k8s documentation - Create cluster with kubeadm:
The network must be deployed before any applications. Also, CoreDNS
will not start up before a network is installed. kubeadm only
supports Container Network Interface (CNI) based networks (and does
not support kubenet).
Actually it is the opposite of a deep or serious issue. This is a trivial issue. Always you see a pod stuck on Pending state, it means the scheduler is having a hard time to schedule the pod; mostly because there are no enough resources on the node.
In your case it is a taint that has the node, and your pod doesn't have the toleration. What you have to do is to describe the node and get the taint:
kubectl describe node | grep -i taints
Note: you might have more then one taint. So you might want to do kubectl describe no NODE since with grep you will only see one taint.
Once you get the taint, that will be something like hello=world:NoSchedule; which means key=value:effect, you will have to add a toleration section in your Deployment. This is an example Deployment so you can see how it should look like:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
labels:
app: nginx
spec:
replicas: 10
strategy:
type: Recreate
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx
ports:
- containerPort: 80
name: http
tolerations:
- effect: NoExecute #NoSchedule, PreferNoSchedule
key: node
operator: Equal
value: not-ready
tolerationSeconds: 3600
As you can see there is the toleration section in the yaml. So, if I would have a node with node=not-ready:NoExecute taint, no pod would be able to be scheduled on that node, unless would have this toleration.
Also you can remove the taint, if you don need it. To remove a taint you would describe the node, get the key of the taint and do:
kubectl taint node NODE key-
Hope it makes sense. Just add this section to your deployment, and it will work.
Set up the flannel network tool.
Running commands:
$ sysctl net.bridge.bridge-nf-call-iptables=1
$ kubectl apply -f
https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml

Error from server (NotFound): podmetrics.metrics.k8s.io "mem-example/memory-demo" not found

I am following this tutorial: https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/
I have created the memory pod demo and I am trying to get the metrics from the pod but it is not working.
I installed the metrics server by cloning: https://github.com/kubernetes-incubator/metrics-server
And then running this command from top level:
kubectl create -f deploy/1.8+/
I am using kubernetes version 1.10.11.
The pod is definitely created:
λ kubectl get pod memory-demo --namespace=mem-example
NAME READY STATUS RESTARTS AGE
memory-demo 1/1 Running 0 6m
But the metics command does not work and gives an error:
λ kubectl top pod memory-demo --namespace=mem-example
Error from server (NotFound): podmetrics.metrics.k8s.io "mem-example/memory-demo" not found
What did I do wrong?
There are some patches to be done to metrics server deployment to get the metrics working.
Follow the below steps
kubectl delete -f deploy/1.8+/
wait till the metrics server gets undeployed
run the below command
kubectl create -f https://raw.githubusercontent.com/epasham/docker-repo/master/k8s/metrics-server.yaml
master $ kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-78fcdf6894-6zg78 1/1 Running 0 2h
coredns-78fcdf6894-gk4sb 1/1 Running 0 2h
etcd-master 1/1 Running 0 2h
kube-apiserver-master 1/1 Running 0 2h
kube-controller-manager-master 1/1 Running 0 2h
kube-proxy-f5z9p 1/1 Running 0 2h
kube-proxy-ghbvn 1/1 Running 0 2h
kube-scheduler-master 1/1 Running 0 2h
metrics-server-85c54d44c8-rmvxh 2/2 Running 0 1m
weave-net-4j7cl 2/2 Running 1 2h
weave-net-82fzn 2/2 Running 1 2h
master $ kubectl top pod -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-78fcdf6894-6zg78 2m 11Mi
coredns-78fcdf6894-gk4sb 2m 9Mi
etcd-master 14m 90Mi
kube-apiserver-master 24m 425Mi
kube-controller-manager-master 26m 62Mi
kube-proxy-f5z9p 2m 19Mi
kube-proxy-ghbvn 3m 17Mi
kube-scheduler-master 8m 14Mi
metrics-server-85c54d44c8-rmvxh 1m 19Mi
weave-net-4j7cl 2m 59Mi
weave-net-82fzn 1m 60Mi
Check and verify the below lines in metrics server deployment manifest.
command:
- /metrics-server
- --metric-resolution=30s
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
On Minikube, I had to wait for 20-25 minutes after enabling the metrics-server addon. I was getting the same error for 20-25 minutes but later I could see the output without attempting for any solution.
I faced the similar issue of
Error from server (NotFound): podmetrics.metrics.k8s.io "default/apple-app" not found
I followed two steps and I was able to resolve the issue.
Download the latest customized components.yaml, which is their official file used for easy deployment.
Update the change
# - /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
to the command section of the deployment specification. I have commented the first line because it is the entrypoint of the image used by kubernetes metrics-server.
$ docker image inspect k8s.gcr.io/metrics-server-amd64:v0.3.6 -f {{.ContainerConfig.Entrypoint}}
[/metrics-server]
Even If you use it or not, it doesn't matter.
Note: You have to wait for few seconds for it to properly work.
After this running the top command will work for you.
$ kubectl top pod apple-app
NAME CPU(cores) MEMORY(bytes)
apple-app 1m 3Mi
I know this is an old thread may be someone will find this answer useful.
You have to checkout the following repo:
https://github.com/kubernetes-incubator/metrics-server
Go to the root of the repo and checkout release-0.3.2.
Remove default metrics server by:
kubectl delete -f deploy/1.8+/
Download the container yaml
wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
Edit the container.yaml by adding the following lines to the argument section. You will see these two lines there
args:
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls=true
There is only one args parameter in that file.
Deploy your pod/deployment and you should be able to do:
kubectl top pod <pod-name>

Kubernetes Canal CNI error on masters

I'm setting up a Kubernetes cluster on a customer.
I've done this process before multiple times, including dealing with vagrant specifics and I've been able to constantly get a K8s cluster up and running without too much fuss.
Now, on this customer I'm doing the same but I've been finding a lot of issues when setting things up, which is completely unexpected.
Comparing to other places where I've setup Kubernetes, the only obvious difference that I see is that I have a proxy server which I constantly have to battle with. Nothing that a NO_PROXY env hasn't been able to handle.
The main issue I'm facing is setting up Canal (Calico + Flannel).
For some reason, on Masters 2 and 3 it just won't start.
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system canal-2pvpr 2/3 CrashLoopBackOff 7 14m 10.136.3.37 devmn2.cpdprd.pt
kube-system canal-rdmnl 2/3 CrashLoopBackOff 7 14m 10.136.3.38 devmn3.cpdprd.pt
kube-system canal-swxrw 3/3 Running 0 14m 10.136.3.36 devmn1.cpdprd.pt
kube-system kube-apiserver-devmn1.cpdprd.pt 1/1 Running 1 1h 10.136.3.36 devmn1.cpdprd.pt
kube-system kube-apiserver-devmn2.cpdprd.pt 1/1 Running 1 4h 10.136.3.37 devmn2.cpdprd.pt
kube-system kube-apiserver-devmn3.cpdprd.pt 1/1 Running 1 1h 10.136.3.38 devmn3.cpdprd.pt
kube-system kube-controller-manager-devmn1.cpdprd.pt 1/1 Running 0 15m 10.136.3.36 devmn1.cpdprd.pt
kube-system kube-controller-manager-devmn2.cpdprd.pt 1/1 Running 0 15m 10.136.3.37 devmn2.cpdprd.pt
kube-system kube-controller-manager-devmn3.cpdprd.pt 1/1 Running 0 15m 10.136.3.38 devmn3.cpdprd.pt
kube-system kube-dns-86f4d74b45-vqdb4 0/3 ContainerCreating 0 1h <none> devmn2.cpdprd.pt
kube-system kube-proxy-4j7dp 1/1 Running 1 2h 10.136.3.38 devmn3.cpdprd.pt
kube-system kube-proxy-l2wpm 1/1 Running 1 2h 10.136.3.36 devmn1.cpdprd.pt
kube-system kube-proxy-scm9g 1/1 Running 1 2h 10.136.3.37 devmn2.cpdprd.pt
kube-system kube-scheduler-devmn1.cpdprd.pt 1/1 Running 1 1h 10.136.3.36 devmn1.cpdprd.pt
kube-system kube-scheduler-devmn2.cpdprd.pt 1/1 Running 1 4h 10.136.3.37 devmn2.cpdprd.pt
kube-system kube-scheduler-devmn3.cpdprd.pt 1/1 Running 1 1h 10.136.3.38 devmn3.cpdprd.pt
Looking for the specific error, I've come to find out that the issue is with the kube-flannel container, which is throwing an error:
[exXXXXX#devmn1 ~]$ kubectl logs canal-rdmnl -n kube-system -c kube-flannel
I0518 16:01:22.555513 1 main.go:487] Using interface with name ens192 and address 10.136.3.38
I0518 16:01:22.556080 1 main.go:504] Defaulting external address to interface address (10.136.3.38)
I0518 16:01:22.565141 1 kube.go:130] Waiting 10m0s for node controller to sync
I0518 16:01:22.565167 1 kube.go:283] Starting kube subnet manager
I0518 16:01:23.565280 1 kube.go:137] Node controller sync successful
I0518 16:01:23.565311 1 main.go:234] Created subnet manager: Kubernetes Subnet Manager - devmn3.cpdprd.pt
I0518 16:01:23.565331 1 main.go:237] Installing signal handlers
I0518 16:01:23.565388 1 main.go:352] Found network config - Backend type: vxlan
I0518 16:01:23.565440 1 vxlan.go:119] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E0518 16:01:23.565619 1 main.go:279] Error registering network: failed to acquire lease: node "devmn3.cpdprd.pt" pod cidr not assigned
I0518 16:01:23.565671 1 main.go:332] Stopping shutdownHandler...
I just can't understand why.
Some relevant info:
My clusterCIDR and podCIDR are: 192.168.151.0/25 (I know, it's weird, don't ask unless it's a huge issue)
I've setup etcd on systemd
I've modified the kube-controller-manager.yaml to change the mask size to 25 (otherwise the IP mentioned before wouldn't work).
I'm installing everything with Kubeadm. One weird thing I did notice was that, when viewing the config (kubeadm config view) much of the information that I had setup on the kubeadm config.yaml (for kubeadm init) was not present in the config view, including the paths to etcd certs.
I'm also not sure why that happened, but I've fixed it (hopefully) by editing the kubeadm config map (kubectl edit cm kubeadm-config -n kube-system) and saving it.
Still no luck with canal.
Can anyone help me figure out what's wrong?
I have documented pretty much every step of the configuration I've done, so if required I may be able to provide it.
EDIT:
I figured how meanwhile that indeed my master2 and 3 do not have a podCIDR associated. Why would this happen? And how can I add it?
Try to edit:
/etc/kubernetes/manifests/kube-controller-manager.yaml
and add
--allocate-node-cidrs=true
--cluster-cidr=192.168.151.0/25
then, reload kubelet.
I found this information here and it was useful for me.

Kubernetes pod not starting

I have a kubernetes cluster with 5 nodes. When I add a simple nginx pod it will be scheduled to one of the nodes but it will not start up. It will not even pull the image.
This is the nginx.yaml file:
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
when I describe the pod there is one event: Successfully assigned busybox to up02 When I log in to the up02 and check to see if there are any images pulled I see it didn't get pulled so I pulled it manually (I thought maybe it needs some kick start ;) )
The pod will allways stay in the Container creating state. It's not only with this pod, the problem is with any pod I try to add.
There are some pods running on the machine which is necessary for Kubernetes to operate:
up#up01:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default busybox 0/1 ContainerCreating 0 11m
default nginx 0/1 ContainerCreating 0 22m
kube-system dummy-2088944543-n1cd5 1/1 Running 0 5d
kube-system etcd-up01 1/1 Running 0 5d
kube-system kube-apiserver-up01 1/1 Running 0 5d
kube-system kube-controller-manager-up01 1/1 Running 0 5d
kube-system kube-discovery-1769846148-xfpls 1/1 Running 0 5d
kube-system kube-dns-2924299975-5rzz8 4/4 Running 0 5d
kube-system kube-proxy-17bpl 1/1 Running 2 3d
kube-system kube-proxy-3pk63 1/1 Running 0 3d
kube-system kube-proxy-h3wrj 1/1 Running 0 5d
kube-system kube-proxy-wzqv4 1/1 Running 0 3d
kube-system kube-proxy-z3xxx 1/1 Running 0 3d
kube-system kube-scheduler-up01 1/1 Running 0 5d
kube-system kubernetes-dashboard-3203831700-3xfbd 1/1 Running 0 5d
kube-system weave-net-6c0nr 2/2 Running 0 3d
kube-system weave-net-dchhf 2/2 Running 0 5d
kube-system weave-net-hshvg 2/2 Running 4 3d
kube-system weave-net-n684c 2/2 Running 1 3d
kube-system weave-net-r5319 2/2 Running 0 3d
You can do
kubectl describe pods <pod>
to get more info on what's happening.
Can you recreate the nginx pod again in namespace kube-system?
kubectl create --namespace kube-system -f nginx.yaml
this should fix your problem.
Second, do you have proxy in your environment, take a look as well.
make sure that your namespace and service account information is correct. if you've configured your services or deployments to use a namespace or service account, that namespace needs to exist.
If you configured it to use a non - default service account then that has to exist as well, and the service account should be created after the namespace.
you shouldn't be necessarily using the kube system namespace. namespaces exist so there can be more than one of them and control the flow of traffic inside of a cluster.
you should also set potentially set permissions for your namespace. read this here.
https://kubernetes.io/docs/reference/access-authn-authz/rbac/#service-account-permissions