Kubespray scale Ansible playbook cannot find /etc/kubernetes/admin.conf - kubernetes

I want to extend my Kubernetes cluster by one node.
So I run the scale.yaml Ansible playbook:
ansible-playbook -i inventory/local/hosts.ini --become --become-user=root scale.yml
But I am getting the error message when uploading the control plane certificates happens:
TASK [Upload control plane certificates] ***************************************************************************************************************************************************
ok: [jay]
fatal: [sam]: FAILED! => {"changed": false, "cmd": ["/usr/local/bin/kubeadm", "init", "phase", "--config", "/etc/kubernetes/kubeadm-config.yaml", "upload-certs", "--upload-certs"], "delta": "0:00:00.039489", "end": "2022-01-08 11:31:37.708540", "msg": "non-zero return code", "rc": 1, "start": "2022-01-08 11:31:37.669051", "stderr": "error execution phase upload-certs: failed to load admin kubeconfig: open /etc/kubernetes/admin.conf: no such file or directory\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["error execution phase upload-certs: failed to load admin kubeconfig: open /etc/kubernetes/admin.conf: no such file or directory", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "", "stdout_lines": []}
Anyone has an idea what the problem could be?
Thanks in advance.

I solved it myself.
I copied the /etc/kubernetes/admin.conf and /etc/kubernetes/ssl/ca.* to the new node and now the scale playbook works. Maybe this is not the right way, but it worked...

Related

Parcel: configured port 1234 could not be used

I need to run my ReactJS application on the port 1234 but when I run yarn dev, I get the following:
$ parcel src/index.html --port 1234
Server running at http://localhost:2493 - configured port 1234 could not be used.
√ Built in 11.45s.
It doesn't tell me why it can't run on port 1234, so I suspected that the port might be in use already. According to this answer, the following should tell me what process is using that port.
Get-Process -Id (Get-NetTCPConnection -LocalPort portNumber).OwningProcess
But that didn't help, it gave the following message:
Get-NetTCPConnection : No MSFT_NetTCPConnection objects found with property 'LocalPort' equal to '1234'. Verify the value of the property and retry.
Which I guess means there is no process bound to port 1234. But if that is the case, why can't I bind to that port?
My package.json is as follows:
{
"name": "bejebeje.react",
"version": "1.0.0",
"description": "bejebeje's react-js frontend",
"main": "index.js",
"repository": "git#github.com:JwanKhalaf/Bejebeje.React.git",
"author": "John",
"license": "GPL-3.0",
"dependencies": {
"#fortawesome/fontawesome-svg-core": "^1.2.19",
"#fortawesome/free-brands-svg-icons": "^5.9.0",
"#fortawesome/free-solid-svg-icons": "^5.9.0",
"#fortawesome/pro-light-svg-icons": "^5.9.0",
"#fortawesome/pro-regular-svg-icons": "^5.9.0",
"#fortawesome/pro-solid-svg-icons": "^5.9.0",
"#fortawesome/react-fontawesome": "^0.1.4",
"#reach/router": "^1.2.1",
"oidc-client": "^1.8.2",
"react": ">=16",
"react-dom": "^0.14.9 || ^15.3.0 || ^16.0.0-rc || ^16.0",
"react-icons": "^3.7.0",
"styled-components": "^4.3.2"
},
"scripts": {
"dev": "parcel src/index.html --port 1234",
"build": "parcel build src/index.html"
},
"devDependencies": {
"#fortawesome/fontawesome-pro": "^5.9.0",
"axios": "^0.19.0",
"parcel-bundler": "^1.12.3",
"prettier": "^1.16.4",
"sass": "^1.22.5"
}
}
After creating a little C# web server that would attempt to bind to the port 1234 I still couldn't get it to work. It would try to bind, but would throw an exception saying:
An attempt was made to access a socket in a way forbidden by its access permissions.
Anyways, after much pain and research here is what finally worked:
First, disable hyper-v (this will restart your PC, so ensure all work is saved). In PowerShell (as admin) run the following:
dism.exe /Online /Disable-Feature:Microsoft-Hyper-V
When your PC has restarted, you need to reserve the port you want so hyper-v doesn't reserve it back, again via PowerShell as admin, run the following:
netsh int ipv4 add excludedportrange protocol=tcp startport=50051 numberofports=1
Now finally re-enable hyper-V (PC will restart again), again via PowerShell as admin:
dism.exe /Online /Enable-Feature:Microsoft-Hyper-V /All
when your PC has finished and is back up, you should be able to bind to that port successfully.
Testing locally with parcel I was able to get the same error with knowingly forcing the following states:
Port in use
Unauthorized use of port
Invalid port
The error message you get is pretty cruddy. To determine what the actual error is you can try and bind the port using other language or program you wish. Since were in javascript we can use this script
require('http').createServer(function(){}).listen(1234);
In my case I already bound port 1234 in a different application and I receive the following error:
events.js:174
throw er; // Unhandled 'error' event
^
Error: listen EADDRINUSE: address already in use :::1234
at Server.setupListenHandle [as _listen2] (net.js:1270:14)
at listenInCluster (net.js:1318:12)
at Server.listen (net.js:1405:7)
at Object.<anonymous> (C:\workdir\ETL_feed\listen.js:1:106)
at Module._compile (internal/modules/cjs/loader.js:701:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:712:10)
at Module.load (internal/modules/cjs/loader.js:600:32)
at tryModuleLoad (internal/modules/cjs/loader.js:539:12)
at Function.Module._load (internal/modules/cjs/loader.js:531:3)
at Function.Module.runMain (internal/modules/cjs/loader.js:754:12)
Emitted 'error' event at:
at emitErrorNT (net.js:1297:8)
at process._tickCallback (internal/process/next_tick.js:63:19)
at Function.Module.runMain (internal/modules/cjs/loader.js:757:11)
at startup (internal/bootstrap/node.js:283:19)
at bootstrapNodeJSCore (internal/bootstrap/node.js:622:3)
As you can see this error message is much more detailed that parcels error. When binding ports many are taken by existing applications for example Wikipedia reports that port 1234 is used by VLC media player (however it reports UDP only).
I was getting this error on a project, but it turns out that the issue was that my project was running on HTTPS while Parcel.js (v2.0) was bundling over HTTP.
This is my original Parcel.js command:
parcel serve path/to/my.file
And this is the fixed version, which bundles over HTTPS:
parcel serve path/to/my.file --https
I'm sure my solution won't fix every instance of the error, but it's worth considering.

Kubespray fails with "Found multiple CRI sockets, please use --cri-socket to select one"

Problem encountered
When deploying a cluster with Kubespray, CRI-O and Cilium I get an error about having multiple CRI socket to choose from.
Full error
fatal: [p3kubemaster1]: FAILED! => {"changed": true, "cmd": " mkdir -p /etc/kubernetes/external_kubeconfig && /usr/local/bin/kubeadm init phase kubeconfig admin --kubeconfig-dir /etc/kubernetes/external_kubeconfig --cert-dir /etc/kubernetes/ssl --apiserver-advertise-address 10.10.3.15 --apiserver-bind-port 6443 >/dev/null && cat /etc/kubernetes/external_kubeconfig/admin.conf && rm -rf /etc/kubernetes/external_kubeconfig ", "delta": "0:00:00.028808", "end": "2019-09-02 13:01:11.472480", "msg": "non-zero return code", "rc": 1, "start": "2019-09-02 13:01:11.443672", "stderr": "Found multiple CRI sockets, please use --cri-socket to select one: /var/run/dockershim.sock, /var/run/crio/crio.sock", "stderr_lines": ["Found multiple CRI sockets, please use --cri-socket to select one: /var/run/dockershim.sock, /var/run/crio/crio.sock"], "stdout": "", "stdout_lines": []}
Interesting part
kubeadm init phase kubeconfig admin --kubeconfig-dir /etc/kubernetes/external_kubeconfig [...] >/dev/null,"stderr": "Found multiple CRI sockets, please use --cri-socket to select one: /var/run/dockershim.sock, /var/run/crio/crio.sock"}
What I've tried
1) I've tried to set the --cri-socket flag inside /var/lib/kubelet/kubeadm-flags.env:
KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --cri-socket=/var/run/crio/crio.sock"
=> Makes no difference
2) I've checked /etc/kubernetes/kubeadm-config.yaml but it already contains the following section :
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 10.10.3.15
bindPort: 6443
certificateKey: 9063a1ccc9c5e926e02f245c06b8d9f2ff3xxxxxxxxxxxx
nodeRegistration:
name: p3kubemaster1
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
criSocket: /var/run/crio/crio.sock
=> Its already ending with the criSocket flag, so nothing to do...
3) Tried to edit the ansible script to add the --cri-socket to the existing command but it fails with Unknow command --cri-socket
Existing :
{% if kubeadm_version is version('v1.14.0', '>=') %}
init phase`
Tried :
{% if kubeadm_version is version('v1.14.0', '>=') %}
init phase --crio socket /var/run/crio/crio.sock`
Theories
It seems that the problem comes from the command kubeadm init phase which is not compatible with the --crio-socket flag... (see point 3)
Even though the correct socket is set (see point 2) using the config file, the kubeadm init phase is not using it.
Any ideas would be apreciated ;-)
thx
This worked for me for multiple cri sockets
kubeadm init --pod-network-cidr=10.244.0.0/16 --cri-socket=unix:///var/run/cri-dockerd.sock
Image pull command before initialization for multiple cri:
kubeadm config images pull --cri-socket=unix:///var/run/cri-dockerd.sock
You can choose cri socket path from the following table. See original documentation here
Runtime
Path to Unix domain socket
containerd
unix:///var/run/containerd/containerd.sock
CRI-O
unix:///var/run/crio/crio.sock
Docker Engine (using cri-dockerd)
unix:///var/run/cri-dockerd.sock
I finally got it !
The initial kubespray command was:
kubeadm init phase kubeconfig admin --kubeconfig-dir {{ kube_config_dir }}/external_kubeconfig
⚠️ It seems that the --kubeconfig-dir flag was not taking into account the number of crio sockets.
So I changed the line to:
kubeadm init phase kubeconfig admin --config /etc/kubernetes/kubeadm-config.yaml
For people having similar issues:
The InitConfig part that made it work on the master is the following:
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 10.10.3.15
bindPort: 6443
certificateKey: 9063a1ccc9c5e926e02f245c06b8d9f2ff3c1eb2dafe5fbe2595ab4ab2d3eb1a
nodeRegistration:
name: p3kubemaster1
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
criSocket: /var/run/crio/crio.sock
In kubespray you must update the file roles/kubernetes/client/tasks/main.yml arround line 57.
You'll have to comment the initial --kubeconfig-dir section and replace it with the path of the InitConfig file.
For me it was generated by kubespray in /etc/kubernetes/kubeadm-config.yaml on the kube master. Check that this file exists on you side and that it contains the criSocket key in the nodeRegistration section.
I have made some research and came upon this github thread.
Which than pointed me to another one here.
This seems to be a kubeadm issue which was already fixed and so the solution is available in v1.15
Could you please upgrade to that version (I am not sure which one you are using basing on both of your question that I have worked on) and see if the problem still persists?

Cannot upgrade node using kubespray

A have test kubernetes on-premise cluster on centos 7.4. Current kubernetes version is 1.10.4. I am trying to upgrade to 1.11.5 using kubespray
The command is:
ansible-playbook upgrade-cluster.yml -b -i inventory/k8s-test/hosts.ini -e kube_version=v1.11.5
Masters are upgraded successfully, but nodes are not.
The error is:
fatal: [kubernodetst1]: FAILED! => {"changed": true, "cmd":
["/usr/local/bin/kubeadm", "join", "--config",
"/etc/kubernetes/kubeadm-client.conf",
"--ignore-preflight-errors=all",
"--discovery-token-unsafe-skip-ca-verification"], "delta":
"0:00:00.040038", "end": "2018-12-13 15:55:56.162387", "msg":
"non-zero return code", "rc": 3, "start": "2018-12-13
15:55:56.122349", "stderr": "discovery: Invalid value: \"\": using
token-based discovery without discoveryTokenCACertHashes can be
unsafe. set --discovery-token-unsafe-skip-ca-verification to
continue", "stderr_lines": ["discovery: Invalid value: \"\": using
token-based discovery without discoveryTokenCACertHashes can be
unsafe. set --discovery-token-unsafe-skip-ca-verification to
continue"], "stdout": "", "stdout_lines": []}
You have a incorrect CA for nodes, regenerate all and try again

Ansible Copy Module Fails

I am trying to copy over the "resolve.conf" file from one machine to another and overwrite the old one. This operation works on all but 4 of the 40+ servers... I get an error it could not replace the file because it is not permitted. I have pasted the contents of the Playbook related to the failure of the operation below.
- hosts: all
remote_user: root
...
- name: Copy over the updated DNS configuration file
copy: src=/etc/resolv.conf dest=/etc/resolv.conf
It gives me the following error message for all 4 servers.
fatal: [server-name]: FAILED! => {"changed": false, "checksum": "9925f1a81f849f373f860c3156d19edcd1c002f2", "failed": true, "msg": "Could not replace file: /root/.ansible/tmp/ansible-tmp-1469481567.72-275811900408782/source to /etc/resolv.conf: [Errno 1] Operation not permitted"}
I just don't understand what the problem could be since I am accessing the machines as the root user and the Playbook succeeds on the majority of the servers - many with the exact same configuration and settings. For example, it succeeds on the server "server-analytical1" but fails on the server "server-analytical2". So, does anyone have any insight into why the Playbook would fail for only a few servers even though they're similar to or the same as other servers that succeeded?
Is the immutable bit set on the target file? Try lsattr /etc/resolv.conf and chattr -i /etc/resolv.conf to unset if it is.

Ansible hangs systemd-journald (stops logging and floods dmesg) with ansible task authorized_key

What's wrong in my playbook to stop journald from logging ?
/var/log/messages is not updated any more!
I can reproduce the issue with this simple isolated playbook :
- name: Reproduce journald hang
hosts: test
vars:
- keys:
- kp_XXX.pub
- kp_YYY.pub
tasks:
- name: Verify journalctl before
command: journalctl --verify
- name: Add SSH keys
authorized_key: user=cloud key="{{ item }}"
with_file: keys
- name: Verify journalctl after
command: journalctl --verify
The key XXX will make journald hang, but not the YYY one.
Reproduced on a CentOs 7.2 with ansible 1.9.4
File kp_XXX.pub:
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDCLd9k03Hvf3QVL8+dYd1KZY9p1ju/RkxHr+t6l6YbMcMfYcLHW6lsNIw2aLC7qpRopQPe/prQZkbXQBy8sYzNUcVtohPTD/V6wX7RXDCiVME9uUztY96Wust1Uc4Z28DhWyC55WFKhetGzfyxK+hMrtORnzdruo/bxHKmGu3rT5HYquB8SlPN/cSG/7itwy6QkXsqzmQUbEvaLPZNwU7qd9LiySFxsbhI2vJz+FiBS+CzkoTKOSZt60I0jRs4wIjXOZjQApcgddGa2ls3vq5HH39Xdr66+PnRU/rrRpaMTrcOTLPzzeWUQoF8VbkSiDXsI8ds+M842DKAT0DFVXnR kp_XXX
File kp_YYY.pub:
ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAQBM9IuDxubRnbFh1e1dvFSKE91vrME5h/nQMsZo1Bmt8FXIQ7wJdNh+ANLYyQA7Q0tiXD1n97QQ9r89iwHFEUZVSXc7VM01AE27N45ybfLmLwtNm+kny6ncoPy7+MHcOQS9Ra56u6Bi6xXUc7vM4pL2iB/m0GnUSmECZZ5EVuOpMeJltf04/+PldQGOqxp9BzVF8XKEPlW5uc6UBesPCoHpR9lGA5UIIYq1sUDIGBy3T7FEXu8KhiHrtb1wuDGJU62SqR/fxgJDypjAtedm41TFcTZOMqTR29KYCKC4OjRaUTu7kf4rWq7/HWJViK2NLaeoy9xyG1BUUTqrGY6qRo85 kp_YYY
Although both keys are added, the third task fails and journald hangs:
TASK: [Verify journalctl after] ***********************************************
failed: [test] => {"changed": true, "cmd": ["journalctl", "--verify"], "delta": "0:00:00.008351", "end": "2016-03-31 12:49:58.669585", "rc": 1, "start": "2016-03-31 12:49:58.661234", "warnings": []}
stderr: 248668: invalid object
File corruption detected at /run/log/journal/cf9b563caf2bc11cab56d6a504ff6a29/system.journal:248668 (of 8388608 bytes, 28%).
FAIL: /run/log/journal/cf9b563caf2bc11cab56d6a504ff6a29/system.journal (Cannot assign requested address)
dmesg logs plenty of thes lines:
[ 761.806277] systemd-journald[366]: Failed to write entry (27 items, 719 bytes), ignoring: Cannot assign requested address
[ 761.807514] systemd-journald[366]: Failed to write entry (23 items, 633 bytes), ignoring: Cannot assign requested address
[ 761.859245] systemd-journald[366]: Failed to write entry (24 items, 718 bytes), ignoring: Cannot assign requested address
When verifying journald files with journalctl --verify:
248668: invalid object
File corruption detected at /run/log/journal/cf9b563caf2bc11cab56d6a504ff6a29/system.journal:248668 (of 8388608 bytes, 28%).
FAIL: /run/log/journal/cf9b563caf2bc11cab56d6a504ff6a29/system.journal (Cannot assign requested address)
Is this an error from me, from ansible, or centos?
How can it be fixed?
Found these 2 related links:
https://access.redhat.com/solutions/2117311?tour=6#comments
Claudio's blog
open the console!
get to be root
examine the output of "journalctl -xn" and "dmesg"
if you get this or something similar
[ 12.713167] systemd-journald[113]: Failed to write entry (9 items, 245 bytes), ignoring: Cannot assign requested address
verify the journals with "journalctl --verify"
check if the logs are within sizes "journalctl --disk-usage"
by then you have an idea of the corrupted logs - they show up iin red, no way you can ignore them
create a new folder move these files.
do a journalctl --verify re run of the verify and to make sure
reboot