Kubespray fails with "Found multiple CRI sockets, please use --cri-socket to select one" - kubernetes

Problem encountered
When deploying a cluster with Kubespray, CRI-O and Cilium I get an error about having multiple CRI socket to choose from.
Full error
fatal: [p3kubemaster1]: FAILED! => {"changed": true, "cmd": " mkdir -p /etc/kubernetes/external_kubeconfig && /usr/local/bin/kubeadm init phase kubeconfig admin --kubeconfig-dir /etc/kubernetes/external_kubeconfig --cert-dir /etc/kubernetes/ssl --apiserver-advertise-address 10.10.3.15 --apiserver-bind-port 6443 >/dev/null && cat /etc/kubernetes/external_kubeconfig/admin.conf && rm -rf /etc/kubernetes/external_kubeconfig ", "delta": "0:00:00.028808", "end": "2019-09-02 13:01:11.472480", "msg": "non-zero return code", "rc": 1, "start": "2019-09-02 13:01:11.443672", "stderr": "Found multiple CRI sockets, please use --cri-socket to select one: /var/run/dockershim.sock, /var/run/crio/crio.sock", "stderr_lines": ["Found multiple CRI sockets, please use --cri-socket to select one: /var/run/dockershim.sock, /var/run/crio/crio.sock"], "stdout": "", "stdout_lines": []}
Interesting part
kubeadm init phase kubeconfig admin --kubeconfig-dir /etc/kubernetes/external_kubeconfig [...] >/dev/null,"stderr": "Found multiple CRI sockets, please use --cri-socket to select one: /var/run/dockershim.sock, /var/run/crio/crio.sock"}
What I've tried
1) I've tried to set the --cri-socket flag inside /var/lib/kubelet/kubeadm-flags.env:
KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --cri-socket=/var/run/crio/crio.sock"
=> Makes no difference
2) I've checked /etc/kubernetes/kubeadm-config.yaml but it already contains the following section :
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 10.10.3.15
bindPort: 6443
certificateKey: 9063a1ccc9c5e926e02f245c06b8d9f2ff3xxxxxxxxxxxx
nodeRegistration:
name: p3kubemaster1
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
criSocket: /var/run/crio/crio.sock
=> Its already ending with the criSocket flag, so nothing to do...
3) Tried to edit the ansible script to add the --cri-socket to the existing command but it fails with Unknow command --cri-socket
Existing :
{% if kubeadm_version is version('v1.14.0', '>=') %}
init phase`
Tried :
{% if kubeadm_version is version('v1.14.0', '>=') %}
init phase --crio socket /var/run/crio/crio.sock`
Theories
It seems that the problem comes from the command kubeadm init phase which is not compatible with the --crio-socket flag... (see point 3)
Even though the correct socket is set (see point 2) using the config file, the kubeadm init phase is not using it.
Any ideas would be apreciated ;-)
thx

This worked for me for multiple cri sockets
kubeadm init --pod-network-cidr=10.244.0.0/16 --cri-socket=unix:///var/run/cri-dockerd.sock
Image pull command before initialization for multiple cri:
kubeadm config images pull --cri-socket=unix:///var/run/cri-dockerd.sock
You can choose cri socket path from the following table. See original documentation here
Runtime
Path to Unix domain socket
containerd
unix:///var/run/containerd/containerd.sock
CRI-O
unix:///var/run/crio/crio.sock
Docker Engine (using cri-dockerd)
unix:///var/run/cri-dockerd.sock

I finally got it !
The initial kubespray command was:
kubeadm init phase kubeconfig admin --kubeconfig-dir {{ kube_config_dir }}/external_kubeconfig
⚠️ It seems that the --kubeconfig-dir flag was not taking into account the number of crio sockets.
So I changed the line to:
kubeadm init phase kubeconfig admin --config /etc/kubernetes/kubeadm-config.yaml
For people having similar issues:
The InitConfig part that made it work on the master is the following:
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 10.10.3.15
bindPort: 6443
certificateKey: 9063a1ccc9c5e926e02f245c06b8d9f2ff3c1eb2dafe5fbe2595ab4ab2d3eb1a
nodeRegistration:
name: p3kubemaster1
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
criSocket: /var/run/crio/crio.sock
In kubespray you must update the file roles/kubernetes/client/tasks/main.yml arround line 57.
You'll have to comment the initial --kubeconfig-dir section and replace it with the path of the InitConfig file.
For me it was generated by kubespray in /etc/kubernetes/kubeadm-config.yaml on the kube master. Check that this file exists on you side and that it contains the criSocket key in the nodeRegistration section.

I have made some research and came upon this github thread.
Which than pointed me to another one here.
This seems to be a kubeadm issue which was already fixed and so the solution is available in v1.15
Could you please upgrade to that version (I am not sure which one you are using basing on both of your question that I have worked on) and see if the problem still persists?

Related

unable to initiate kubeadm in the controller node , It says port in use

sudo kubeadm init
I0609 02:20:26.963781 3600 version.go:252] remote version is much newer: v1.21.1; falling back to: stable-1.18
W0609 02:20:27.069495 3600 configset.go:202]
WARNING: kubeadm cannot validate component configs `for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]`
`[init] Using Kubernetes version: v1.18.19`
`[preflight] Running pre-flight checks`
`error execution phase preflight: [preflight] Some fatal errors occurred:`
`[ERROR Port-10259]: Port 10259 is in use`
`[ERROR Port-10257]: Port 10257 is in use`
`[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: `/etc/kubernetes/manifests/kube-apiserver.yaml already exists`
`[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]:` `/etc/kubernetes/manifests/kube-controller-manager.yaml already exists`
`[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]:` /etc/kubernetes/manifests/kube-scheduler.yaml already exists
`[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists`
`[ERROR Port-10250]: Port 10250 is in use`
`[ERROR Port-2379]: Port 2379 is in use`
`[ERROR Port-2380]: Port 2380 is in use`
`[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty`
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
Hi and welcome to Stack Overflow.
"Port in use" means that there's a process running that uses that port. So you need to stop that process. Since you already ran kubeadm init once, it must have already changed a number of things.
First run kubeadm reset to undo all of the changes from the first time you ran it.
Then run systemctl restart kubelet.
Finally, when you run kubeadm init you should no longer get the error.
Even after following the above steps , if you get this error:
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
Then, remove the etcd folder (/var/lib/etcd) before you run kubeadm init.
Note:
This solution worked for other users.
The warning itself is not an issue, it's just warning that kubeadm no longer validates the KubeletConfiguration, KubeProxyConfiguration that it feeds to the kubelet, kube-proxy components.
I also got this issue, additionally i had to manually kill kubelet , using
$ pkill kubelet
kubeadm init worked without issues after this,.

Proxmox lxc add add linux.kernel_modules

I am trying to setup an LXC container (debian) as a Kubernetes node.
I am so far that the only thing in the way is the kubeadm init script...
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR SystemVerification]: failed to parse kernel config: unable to load kernel module: "configs", output: "modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/5.4.44-2-pve/modules.dep.bin'\nmodprobe: FATAL: Module configs not found in directory /lib/modules/5.4.44-2-pve\n", err: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
After some research I figured out that I probably need to add the following: linux.kernel_modules: ip_tables,ip6_tables,netlink_diag,nf_nat,overlay
But adding this to /etc/pve/lxc/107.conf doesn't do anything.
Does anybody have a clue how to add the linux kernel modules?
To allow load with modprobe any modules inside privileged proxmox lxc container, you need add this options to container config:
lxc.apparmor.profile: unconfined
lxc.cgroup.devices.allow: a
lxc.cap.drop:
lxc.mount.auto: proc:rw sys:rw
lxc.mount.entry: /lib/modules lib/modules none bind 0 0
before that, you must first create the /lib/modules folder inside the container
I'm not sure what guide you are following but assuming that you have the required kernel modules on the host, this would do it:
lxc config set my-container linux.kernel_modules overlay
You can follow this guide from K3s too. Basically:
lxc config edit k3s-lxc
and
config:
linux.kernel_modules: ip_tables,ip6_tables,netlink_diag,nf_nat,overlay
raw.lxc: lxc.mount.auto=proc:rw sys:rw
security.privileged: "true"
security.nesting: "true"
✌️
For the fix ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file run from the host:
pct set $VMID --mp0 /usr/lib/modules/$(uname -r),mp=/lib/modules/$(uname -r),ro=1,backup=0
For the fix [ERROR SystemVerification]: failed to parse kernel config run from the host:
pct push $VMID /boot/config-$(uname -r) /boot/config-$(uname -r)
Where $VMID is your container id.

How to use Kubectl commands to Acess a Rancher Cluster through Ansible

I am currently developing a project where I need to get the pod names of a Kubernetes Cluster running on Rancher using Ansible. The main thing here is that I have a couple of problems that are preventing me from advance.
I am currently executing a playbook to try to retrieve this information, instead of running a CLI command, because I want to manipulate those Rancher machines later one (e.g. install an rpm file).
Here is the playbook that I am executing tot try to retrieve the pods' names from Rancher:
---
- hosts: localhost
connection: local
remote_user: root
roles:
- role: ansible.kubernetes-modules
- role: hello-world
vars:
ansible_python_interpreter: '{{ ansible_playbook_python }}'
collections:
- community.kubernetes
tasks:
-
name: Gather openShift Dependencies
python_requirements_facts:
dependencies:
- openshift
-
name: Get the pods in the specific namespace
k8s_info:
kubeconfig: '/etc/ansible/RCCloudConfig'
kind: Pod
namespace: redmine
register: pod_list
-
name: Print pod names
debug:
msg: "pod_list: {{ pod_list | json_query('resources[*].status.podIP') }} "
- set_fact:
pod_names: "{{pod_list|json_query('resources[*].metadata.name')}}"
The problem is that I am getting a Kubernetes module error each time I am trying to run the playbook:
ERROR! the role 'ansible.kubernetes-modules' was not found in community.kubernetes:ansible .legacy:/etc/ansible/roles:/home/jcp/.ansible/roles:/usr/share/ansible/roles:/etc/ansible/ roles:/etc/ansible
The error appears to be in '/etc/ansible/GetKubectlPods': line 7, column 7, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
roles:
- role: ansible.kubernetes-modules
^ here
If I remove that line on the code, Where I try to retrieve that role, I still get a similar error:
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ModuleNotFoundError: No module named 'kubernetes'
fatal: [localhost]: FAILED! => {"changed": false, "error": "No module named 'kubernetes'", "msg": "Failed to import the required Python library (openshift) on localhost.localdomain's Python /usr/bin/python3.6. Please read module documentation and install in the appropriate location. If the required library is installed, but Ansible is using the wrong Python interpreter, please consult the documentation on ansible_python_interpreter"}
I have already tried to install ansible-galaxy kubernetes module on the machine and openshift.
Not sure what I am doing wrong since there are so many possibilities for what could be going wrong here.
Ansible Version Output:
ansible 2.9.9
config file = /etc/ansible/ansible.cfg
configured module search path = ['/home/jcp/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/jcp/.local/lib/python3.6/site-packages/ansible
executable location = /home/jcp/.local/bin/ansible
python version = 3.6.8 (default, Nov 21 2019, 19:31:34) [GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]
I've debugged my python_required_info output from openshift dependencies and this is what I have:
ok: [localhost] => {
"openshift_dependencies": {
"changed": false,
"failed": false,
"mismatched": {},
"not_found": [],
"python": "/usr/bin/python3.6",
"python_system_path": [
"/tmp/ansible_python_requirements_info_payload_5_kb4a7s/ansible_python_requirements_info_payloa d.zip",
"/usr/lib64/python36.zip",
"/usr/lib64/python3.6",
"/usr/lib64/python3.6/lib-dynload",
"/home/jcp/.local/lib/python3.6/site-packages",
"/usr/local/lib/python3.6/site-packages",
"/usr/local/lib/python3.6/site-packages/openshift-0.10.0.dev1-py3.6.egg",
"/usr/lib64/python3.6/site-packages",
"/usr/lib/python3.6/site-packages"
],
"python_version": "3.6.8 (default, Nov 21 2019, 19:31:34) \n[GCC 8.3.1 20190507 (Red Hat 8.3.1-4)]" ,
"valid": {
"openshift": {
"desired": null,
"installed": "0.10.0.dev1"
}
}
}
}
Thanks for your help in advance!
Edit: The below answer was given for OP's specific Ansible version (i.e. 2.9.9) and is still valid if you still use it. Since version 2.10, you also need to install the relevant ansible collection if not already present
ansible-galaxy collection install kubernetes.core
See the latest module documentation for more information
In Ansible 2.9.9, you're not supposed to do anything special to use the module except installing the needed python dependencies. See the module documentation for your Ansible version
remove the line - role: ansible.kubernetes-modules, unless it is a module of yours in which case you have to tell us more because this is not a correct declaration.
remove the collection declaration
Add the following task somewhere before using the module:
- name: Make sure python deps are installed
pip:
name: openshift
Your actual python_requirement_facts task is doing nothing else than reporting the dependency is not found. Register the result and debug it to see for yourself.
Now use the k8s_info module normally.

Received AliveMessage from a peer with the same PKI-ID as myself

I am attempting to port the Hyperledger Fabric Getting Started to Kubernetes. But am struggling to get peer1's to deploy. If I enable CORE_PEER_GOSSIP_BOOTSTRAP, I receive errors "Received AliveMessage from a peer with the same PKI-ID as myself".
How can I debug a peer reportedly having the same PKI-ID as another?
Using this as a starting point:
https://hyperledger-fabric.readthedocs.io/en/latest/getting_started.html
I am able to create:
orderer and cli pods in default namespace
peer0's one in each org1|org2 namespace.
peer1's but only if I disable (comment out) CORE_PEER_GOSSIP_BOOTSTRAP
If I enable CORE_PEER_GOSSIP_BOOTSTRAP for the peer1's, I receive the following warning and error:
[gossip/gossip#10.0.0.10:7051] NewGossipService -> WARN 01c External endpoint is empty, peer will not be accessible outside of its organization
...
[gossip/discovery#10.0.0.10:7051] handleAliveMessage -> ERRO 02a Bad configuration detected: Received AliveMessage from a peer with the same PKI-ID as myself: tag:EMPTY alive_msg:<membership:<pki_id:"[[REDACTED]]" > timestamp:<inc_number:1495468533769417608 seq_num:416 > >
In order to better map the Orderer, Peers to DNS names, I'm using Kubernetes Namespaces and this configuration:
OrdererOrgs:
- Name: Orderer
Domain: default.svc.cluster.local
Specs:
- Hostname: orderer
PeerOrgs:
- Name: Org1
Domain: org1.svc.cluster.local
Template:
Count: 2
Users:
Count: 2
- Name: Org2
Domain: org2.svc.cluster.local
Template:
Count: 2
Users:
Count: 2
In order to expose the peer0's to the other peers in the org and to expose the orderer, I have ClusterIP services for the peer0's (selecting only the peer0's) and orderer. It's inelegant but I'm trying to get it to work before I get it working more beautifully.
I am able to resolve orderer.default.svc.cluster.local, peer0.org1.svc.cluster.local, `peer0.org2.svc.cluster.local' using nslookup from within a pod deployed to default on the cluster.
Absent a curl-like tool for gPRC, I am able to open sockets against these endpoints on 7051 and 7053.
First, make sure you are using the right certificates.
Second, verify that your environment/configuration for gossip is set correctly
environment:
- CORE_PEER_GOSSIP_EXTERNALENDPOINT=peer1.org1.example.com:8051
- CORE_PEER_GOSSIP_BOOTSTRAP=peer0.org1.example.com:7051
- CORE_PEER_GOSSIP_ENDPOINT=peer0.org1.example.com:7051
OR in core.yaml
peer:
gossip:
bootstrap: peer0.org1.example.com:7051
externalEndpoint: peer1.org1.example.com:8051
endpoint: peer0.org1.example.com:7051
Edited: Also make sure that you have properly setup your CA
Hope this helps, it worked for me. And I was successfully able to connect peers.
If the peers are started from the same node, its possible that you are mounting the same crypto-material (path to mspconfig directory) for both the peers. If that is the case, separate the directory structures for both the peers and keep their respective certificates in them, update the respective paths for msp in docker-compose file and try to run.

How to set kube-scheduler print log to file

kubernetes's version is 1.2
I want to watch the scheduler's log. So how to set kube-scheduler's log print to a file?
The kube-scheduler's configuration is at this path: /etc/kubernetes/scheduler.
And the global configuration is at this path: /etc/kubernetes/config.
So we can see these notes:
# logging to stderr means we get it in the systemd journal
KUBE_LOGTOSTDERR="--logtostderr=true"
# journal message level, 0 is debug
KUBE_LOG_LEVEL="--v=0"
Can you tail the contents of the service (if running in systemd): journalctl -u apiserver -f
Or if a container, find the container id of the scheduler, and tail with docker: docker logs -f