Cannot upgrade node using kubespray - kubernetes

A have test kubernetes on-premise cluster on centos 7.4. Current kubernetes version is 1.10.4. I am trying to upgrade to 1.11.5 using kubespray
The command is:
ansible-playbook upgrade-cluster.yml -b -i inventory/k8s-test/hosts.ini -e kube_version=v1.11.5
Masters are upgraded successfully, but nodes are not.
The error is:
fatal: [kubernodetst1]: FAILED! => {"changed": true, "cmd":
["/usr/local/bin/kubeadm", "join", "--config",
"/etc/kubernetes/kubeadm-client.conf",
"--ignore-preflight-errors=all",
"--discovery-token-unsafe-skip-ca-verification"], "delta":
"0:00:00.040038", "end": "2018-12-13 15:55:56.162387", "msg":
"non-zero return code", "rc": 3, "start": "2018-12-13
15:55:56.122349", "stderr": "discovery: Invalid value: \"\": using
token-based discovery without discoveryTokenCACertHashes can be
unsafe. set --discovery-token-unsafe-skip-ca-verification to
continue", "stderr_lines": ["discovery: Invalid value: \"\": using
token-based discovery without discoveryTokenCACertHashes can be
unsafe. set --discovery-token-unsafe-skip-ca-verification to
continue"], "stdout": "", "stdout_lines": []}

You have a incorrect CA for nodes, regenerate all and try again

Related

Kerbros not found ansible

I am trying to run playbooks on my Windows Servers. Some work but others give me the following errors:
UNREACHABLE! => {"changed": false, "msg": "kerberos: authGSSClientStep() failed: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Server not found in Kerberos database', -1765328377))", "unreachable": true}
Or
UNREACHABLE! => {"changed": false, "msg": "kerberos: ('Connection aborted.', ConnectionResetError(104, 'Connexion ré-initialisée par le correspondant'))", "unreachable": true}
Thank you !
My conf file krb5.conf on my PC is OK
I looked about the SPN and my different Windows servers have the same conf

Kubespray scale Ansible playbook cannot find /etc/kubernetes/admin.conf

I want to extend my Kubernetes cluster by one node.
So I run the scale.yaml Ansible playbook:
ansible-playbook -i inventory/local/hosts.ini --become --become-user=root scale.yml
But I am getting the error message when uploading the control plane certificates happens:
TASK [Upload control plane certificates] ***************************************************************************************************************************************************
ok: [jay]
fatal: [sam]: FAILED! => {"changed": false, "cmd": ["/usr/local/bin/kubeadm", "init", "phase", "--config", "/etc/kubernetes/kubeadm-config.yaml", "upload-certs", "--upload-certs"], "delta": "0:00:00.039489", "end": "2022-01-08 11:31:37.708540", "msg": "non-zero return code", "rc": 1, "start": "2022-01-08 11:31:37.669051", "stderr": "error execution phase upload-certs: failed to load admin kubeconfig: open /etc/kubernetes/admin.conf: no such file or directory\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["error execution phase upload-certs: failed to load admin kubeconfig: open /etc/kubernetes/admin.conf: no such file or directory", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "", "stdout_lines": []}
Anyone has an idea what the problem could be?
Thanks in advance.
I solved it myself.
I copied the /etc/kubernetes/admin.conf and /etc/kubernetes/ssl/ca.* to the new node and now the scale playbook works. Maybe this is not the right way, but it worked...

Kubespray fails with "Found multiple CRI sockets, please use --cri-socket to select one"

Problem encountered
When deploying a cluster with Kubespray, CRI-O and Cilium I get an error about having multiple CRI socket to choose from.
Full error
fatal: [p3kubemaster1]: FAILED! => {"changed": true, "cmd": " mkdir -p /etc/kubernetes/external_kubeconfig && /usr/local/bin/kubeadm init phase kubeconfig admin --kubeconfig-dir /etc/kubernetes/external_kubeconfig --cert-dir /etc/kubernetes/ssl --apiserver-advertise-address 10.10.3.15 --apiserver-bind-port 6443 >/dev/null && cat /etc/kubernetes/external_kubeconfig/admin.conf && rm -rf /etc/kubernetes/external_kubeconfig ", "delta": "0:00:00.028808", "end": "2019-09-02 13:01:11.472480", "msg": "non-zero return code", "rc": 1, "start": "2019-09-02 13:01:11.443672", "stderr": "Found multiple CRI sockets, please use --cri-socket to select one: /var/run/dockershim.sock, /var/run/crio/crio.sock", "stderr_lines": ["Found multiple CRI sockets, please use --cri-socket to select one: /var/run/dockershim.sock, /var/run/crio/crio.sock"], "stdout": "", "stdout_lines": []}
Interesting part
kubeadm init phase kubeconfig admin --kubeconfig-dir /etc/kubernetes/external_kubeconfig [...] >/dev/null,"stderr": "Found multiple CRI sockets, please use --cri-socket to select one: /var/run/dockershim.sock, /var/run/crio/crio.sock"}
What I've tried
1) I've tried to set the --cri-socket flag inside /var/lib/kubelet/kubeadm-flags.env:
KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --cri-socket=/var/run/crio/crio.sock"
=> Makes no difference
2) I've checked /etc/kubernetes/kubeadm-config.yaml but it already contains the following section :
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 10.10.3.15
bindPort: 6443
certificateKey: 9063a1ccc9c5e926e02f245c06b8d9f2ff3xxxxxxxxxxxx
nodeRegistration:
name: p3kubemaster1
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
criSocket: /var/run/crio/crio.sock
=> Its already ending with the criSocket flag, so nothing to do...
3) Tried to edit the ansible script to add the --cri-socket to the existing command but it fails with Unknow command --cri-socket
Existing :
{% if kubeadm_version is version('v1.14.0', '>=') %}
init phase`
Tried :
{% if kubeadm_version is version('v1.14.0', '>=') %}
init phase --crio socket /var/run/crio/crio.sock`
Theories
It seems that the problem comes from the command kubeadm init phase which is not compatible with the --crio-socket flag... (see point 3)
Even though the correct socket is set (see point 2) using the config file, the kubeadm init phase is not using it.
Any ideas would be apreciated ;-)
thx
This worked for me for multiple cri sockets
kubeadm init --pod-network-cidr=10.244.0.0/16 --cri-socket=unix:///var/run/cri-dockerd.sock
Image pull command before initialization for multiple cri:
kubeadm config images pull --cri-socket=unix:///var/run/cri-dockerd.sock
You can choose cri socket path from the following table. See original documentation here
Runtime
Path to Unix domain socket
containerd
unix:///var/run/containerd/containerd.sock
CRI-O
unix:///var/run/crio/crio.sock
Docker Engine (using cri-dockerd)
unix:///var/run/cri-dockerd.sock
I finally got it !
The initial kubespray command was:
kubeadm init phase kubeconfig admin --kubeconfig-dir {{ kube_config_dir }}/external_kubeconfig
⚠️ It seems that the --kubeconfig-dir flag was not taking into account the number of crio sockets.
So I changed the line to:
kubeadm init phase kubeconfig admin --config /etc/kubernetes/kubeadm-config.yaml
For people having similar issues:
The InitConfig part that made it work on the master is the following:
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 10.10.3.15
bindPort: 6443
certificateKey: 9063a1ccc9c5e926e02f245c06b8d9f2ff3c1eb2dafe5fbe2595ab4ab2d3eb1a
nodeRegistration:
name: p3kubemaster1
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
criSocket: /var/run/crio/crio.sock
In kubespray you must update the file roles/kubernetes/client/tasks/main.yml arround line 57.
You'll have to comment the initial --kubeconfig-dir section and replace it with the path of the InitConfig file.
For me it was generated by kubespray in /etc/kubernetes/kubeadm-config.yaml on the kube master. Check that this file exists on you side and that it contains the criSocket key in the nodeRegistration section.
I have made some research and came upon this github thread.
Which than pointed me to another one here.
This seems to be a kubeadm issue which was already fixed and so the solution is available in v1.15
Could you please upgrade to that version (I am not sure which one you are using basing on both of your question that I have worked on) and see if the problem still persists?

kuberctl not work with remote cluster on windows

I want to control my remote k8s cluster but have a problem with it
I have kubeconfig file from k8s cluster admin. But when I try to connect cluster I take a mistake, when get with browser - result is ok
kubectl version --kubeconfig ./.kube/config -v=12 --insecure-skip-tls-verify=true --alsologtostderr
I0211 18:13:42.625408 12960 loader.go:359] Config loaded from file ./.kube/config
...
I0211 18:13:54.691273 12960 helpers.go:216] Connection error: Get https://k8s-t-deponl-01.raiffeisen.ru:8443/version?timeout=32s: Tunnel Connection Failed
F0211 18:13:54.692219 12960 helpers.go:116] Unable to connect to the server: Tunnel Connection Failed
With browser take answer:
{
"major": "1",
"minor": "11",
"gitVersion": "v1.11.5",
"gitCommit": "753b2dbc622f5cc417845f0ff8a77f539a4213ea",
"gitTreeState": "clean",
"buildDate": "2018-11-26T14:31:35Z",
"goVersion": "go1.10.3",
"compiler": "gc",
"platform": "linux/amd64"
}
Why I have that problem?

Unable to fully collect metrics, when installing metric-server

I have installed the metric server on kubernetes, but its not working and logs
unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:xxx: unable to fetch metrics from Kubelet ... (X.X): Get https:....: x509: cannot validate certificate for 1x.x.
x509: certificate signed by unknown authority
I was able to get metrics if modified the deployment yaml and added
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
this now collects metrics, and kubectl top node returns results...
but logs still show
E1120 11:58:45.624974 1 reststorage.go:144] unable to fetch pod metrics for pod dev/pod-6bffbb9769-6z6qz: no metrics known for pod
E1120 11:58:45.625289 1 reststorage.go:144] unable to fetch pod metrics for pod dev/pod-6bffbb9769-rzvfj: no metrics known for pod
E1120 12:00:06.462505 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-1x.x.x.eu-west-1.compute.internal: unable to get CPU for container ...discarding data: missing cpu usage metric, unable to fully scrape metrics from source
so questions
1) All this works on minikube, but not on my dev cluster, why would that be?
2) In production i dont want to do insecure-tls.. so can someone please explain why this issue is arising... or point me to some resource.
Kubeadm generates the kubelet certificate at /var/lib/kubelet/pki and those certificates (kubelet.crt and kubelet.key) are signed by different CA from the one which is used to generate all other certificates at /etc/kubelet/pki.
You need to regenerate the kubelet certificates which is signed by your root CA (/etc/kubernetes/pki/ca.crt)
You can use openssl or cfssl to generate the new certificates(I am using cfssl)
$ mkdir certs; cd certs
$ cp /etc/kubernetes/pki/ca.crt ca.pem
$ cp /etc/kubernetes/pki/ca.key ca-key.pem
Create a file kubelet-csr.json:
{
"CN": "kubernetes",
"hosts": [
"127.0.0.1",
"<node_name>",
"kubernetes",
"kubernetes.default",
"kubernetes.default.svc",
"kubernetes.default.svc.cluster",
"kubernetes.default.svc.cluster.local"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [{
"C": "US",
"ST": "NY",
"L": "City",
"O": "Org",
"OU": "Unit"
}]
}
Create a ca-config.json file:
{
"signing": {
"default": {
"expiry": "8760h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "8760h"
}
}
}
}
Now generate the new certificates using above files:
$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem \
--config=ca-config.json -profile=kubernetes \
kubelet-csr.json | cfssljson -bare kubelet
Replace the old certificates with newly generated one:
$ scp kubelet.pem <nodeip>:/var/lib/kubelet/pki/kubelet.crt
$ scp kubelet-key.pem <nodeip>:/var/lib/kubelet/pki/kubelet.key
Now restart the kubelet so that new certificates will take effect on your node.
$ systemctl restart kubelet
Look at the following tickets to get the context of issue:
https://github.com/kubernetes-incubator/metrics-server/issues/146
Hope this helps.