kubeadm join failing on Pi 3 model B. cannot create certificate signing request: the server could not find the requested resource - raspberry-pi

I have set up a cluster of four Raspberry Pi 3 Model B and installed HypriotOS.
The master node is running fine however when I issue the join command on one of the other Pi 3 nodes it fails with the following error:
HypriotOS/armv7: root#black-pearl_1 in ~
$ kubeadm join --token=f5ffb9.0fefbf6e0f289a61 192.168.1.20 --skip-preflight-checks
[kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters.
[preflight] Skipping pre-flight checks
[tokens] Validating provided token
[discovery] Created cluster info discovery client, requesting info from "http://192.168.1.20:9898/cluster-info/v1/?token-id=f5ffb9"
[discovery] Cluster info object received, verifying signature using given token
[discovery] Cluster info signature and contents are valid, will use API endpoints [https://192.168.1.20:6443]
[bootstrap] Trying to connect to endpoint https://192.168.1.20:6443
[bootstrap] Detected server version: v1.6.0
[bootstrap] Successfully established connection with endpoint "https://192.168.1.20:6443"
[csr] Created API client to obtain unique certificate for this node, generating keys and certificate signing request
failed to request signed certificate from the API server [cannot create certificate signing request: the server could not find the requested resource]
I have tried these solutions and neither of them work, I get the same error on any of the other three Pi nodes when running the join. kubeadm join failing. Unable to request signed cert
To replicate this just follow this set up procedure
https://blog.hypriot.com/post/setup-kubernetes-raspberry-pi-cluster/
Can anyone help with my understanding of what is going wrong and how to fix as I cannot find a solution on the Kubernetes trouble shooting?
To help by responding to Karun's the question about using the --skip-preflight-checks flag. When running the join command without this it simply fails with the following:
HypriotOS/armv7: root#black-pearl_2 in ~
$ kubeadm join --token=f5ffb9.0fefbf6e0f289a61 192.168.1.20
[kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters.
[preflight] Running pre-flight checks
[preflight] The system verification failed. Printing the output from the verification:
OS: Linux
KERNEL_VERSION: 4.4.50-hypriotos-v7+
CONFIG_NAMESPACES: enabled
CONFIG_NET_NS: enabled
CONFIG_PID_NS: enabled
CONFIG_IPC_NS: enabled
CONFIG_UTS_NS: enabled
CONFIG_CGROUPS: enabled
CONFIG_CGROUP_CPUACCT: enabled
CONFIG_CGROUP_DEVICE: enabled
CONFIG_CGROUP_FREEZER: enabled
CONFIG_CGROUP_SCHED: enabled
CONFIG_CPUSETS: enabled
CONFIG_MEMCG: enabled
CONFIG_INET: enabled
CONFIG_EXT4_FS: enabled
CONFIG_PROC_FS: enabled
CONFIG_NETFILTER_XT_TARGET_REDIRECT: enabled (as module)
CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled (as module)
CONFIG_OVERLAY_FS: enabled (as module)
CONFIG_AUFS_FS: not set - Required for aufs.
CONFIG_BLK_DEV_DM: enabled (as module)
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
DOCKER_VERSION: 17.03.0-ce
[preflight] Some fatal errors occurred:
unsupported docker version: 17.03.0-ce
hostname "black-pearl_2" must match the regex [a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)* (e.g. 'example.com')
Port 10250 is in use
/etc/kubernetes/manifests is not empty
/var/lib/kubelet is not empty
/etc/kubernetes/kubelet.conf already exists
[preflight] If you know what you are doing, you can skip pre-flight checks with `--skip-preflight-checks`
The --skip-preflight-checks flag was also required to get the master node running. I therefore assume this issue must be arm specific as the solution to copy
/etc/kubernetes/*
from the master also fails to solve the issue.

Any reason why you are passing --skip-preflight-checks as a flag in your kubeadm join command? Anything failing there? May be that can lead to some clues.
I have not worked on Raspberry, but I faced this sort of an issue on Ubuntu env. For some reason on slave node ('am firing kubeadm join command) /etc/kubernetes folder wasn't there. I just copied from master node contents of /etc/kubernetes/* into slave nodes at same location /etc/kubernetes.
After this run kubeadm join with --skip-preflight-checks explicity.
This time kubeadm got the required resources and

Related

Timeout on kubernetes api server logs

I've an issue installing k8s with kubespray, the problem is with api server, on start up it complains about some time out errors and goes down.
Bottom line of long error message is like this:
logging error output: "k8s\x00\n\f\n\x02v1\x12\x06Status\x12b\n\x04\n\x00\x12\x00\x12\aFailure\x1a9Timeout: request did not complete within allowed duration\"\aTimeout*\n\n\x00\x12\x00\x1a\x00(\x002\x000\xf8\x03\x1a\x00\"\x00"
Also this is result of health check
-> curl localhost:8080/healthz
[+]ping ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[-]poststarthook/bootstrap-controller failed: reason withheld
[+]poststarthook/extensions/third-party-resources ok
[-]poststarthook/ca-registration failed: reason withheld
[+]poststarthook/start-kube-apiserver-informers ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[-]autoregister-completion failed: reason withheld
healthz check failed
I've changed api server manifest and set --v=5 but still I don't see any useful logs.
How can I debug the issue?
I suffered the same problem recently. The health check log is same with you.
Etcd itself is ok, etcdctl can operate it, apiserver shows etcd is ok too.
k8s apiserver log only shows timeout of etcd.
After i check the socket between etcd and apiserver, i found that apiserver had no connection to etcd at all.
so i use check the client-certificates files and found that its validity time is expired.so that apiserver can't establish ssl connection on etcd. But apiserver didn't show the accurate erro.
Hope this help you.

Hyperledger-fabric v1.0.0 instantiate chaincode on kubernetes failed

I was testing Hyperledger-fabic v1.0.0 with kubernetes. The fabic contains 2 orgs, 4 peers, 1 orderer, and a cli. The things goes well until I instantiates the chaincode in the cli. The peer's error message in the picture. It says missing image, but the image just create successful. What's the problem and how can I solve it?
enter image description here
peer's error message
The answer can be found in the Hyperledger RocketChat on the #fabric-kubernetes channel.
"you basically need the peer to surface its dynamic IP (thats what AUTOADDRESS does) and then tell the chaincode to basically ignore the x509 CN thats what SERVERHOSTOVERRIDE does
(and the other part is you need the peer pod to be privelged so it has the rights to drive the docker-api".
Basically, there's lots to be learned from following the discussion from that point.

installing kubernetes on coreos with rkt and automated script

I'm trying to install kuberentes with rkt on my real (not virtual) coreos servers at home using the scripts at https://github.com/coreos/coreos-kubernetes/tree/master/multi-node/generic and I have some questions.
my etcd2 is using tls keys, I can't see anywhere in the script where I can define where the certificates are located.
can I supply a domain instead of IP for ADVERTISE_IP and CONTROLLER_ENDPOINT ?
when I tried to install kubernetes manually I needed start the rkt service api. it doesn't state in the documents that it needed here, does it mean that I don't need it if I use these scripts? or is it just something that's missing in the documents?
thanks!
update
Rob thank you so much for your response. I wasn't clear enough regarding etcd2. I already have etcd2 tls installed and properly configured on my coreos servers. so I configured my etcd servers in the controller-install.sh file:
export ETCD_ENDPOINTS="https://coreos-2.tux-in.com:2379,https://coreos-3.tux-in.com:2379"
but when I run the controller-install.sh script, it returns and repeat the following output:
Waiting for etcd...
Trying: https://coreos-2.tux-in.com:2379
Trying: https://coreos-3.tux-in.com:2379
Trying: https://coreos-2.tux-in.com:2379
Trying: https://coreos-3.tux-in.com:2379
...
so I was guessing it's because i didn't define etcd related tls certificates in the controller script and that is why it stuck in that faze.
on my macbook pro laptop I have the following alias configured:
alias myetcdctl="~/apps/etcd-v3.0.8-darwin-amd64/etcdctl --endpoint=https://coreos-2.tux-in.com:2379 --ca-file=/Users/ufk/Projects/coreos/tux-in/etcd/certs/certs-names/ca.pem --cert-file=/Users/ufk/Projects/coreos/tux-in/etcd/certs/certs-names/etcd1.pem --key-file=/Users/ufk/Projects/coreos/tux-in/etcd/certs/certs-names/etcd1-key.pem --timeout=10s"
so when I run myetcdctl member list I get:
8832ce6a269a7dac: name=ccff826d5f564c67abf35467306f80a0 peerURLs=https://coreos-3.tux-in.com:2380 clientURLs=https://coreos-3.tux-in.com:2379 isLeader=true
a2c0ac9708ef90fc: name=dc38bc8f20e64940b260d3f7b260430d peerURLs=https://coreos-2.tux-in.com:2380 clientURLs=https://coreos-2.tux-in.com:2379 isLeader=false
so I'm guessing that I don't really have a problem there.
any ideas?
thanks!
my etcd2 is using tls keys, I can't see anywhere in the script where I can define where the certificates are located.
These scripts don't start an etcd server. You will need to set one up manually and will be able to use TLS and as many nodes as you would like. This isn't clear in the current form of the document, I will attempt a PR to fix.
can I supply a domain instead of IP for ADVERTISE_IP and CONTROLLER_ENDPOINT ?
Only CONTROLLER_ENDPOINT be a domain name.
when I tried to install kubernetes manually I needed start the rkt service api. it doesn't state in the documents that it needed here, does it mean that I don't need it if I use these scripts? or is it just something that's missing in the documents?
These scripts include/start the rkt API service. As you can see below, it also has a Restart parameter set (source):
[Unit]
Before=kubelet.service
[Service]
ExecStart=/usr/bin/rkt api-service
Restart=always
RestartSec=10
[Install]
RequiredBy=kubelet.service

Kube-apiserver complains about remote error bad certificate

I reinstalled some nodes and a master. Now on the master I am getting:
Sep 15 04:53:58 master kube-apiserver[803]: I0915 04:53:58.413581 803 logs.go:41] http: TLS handshake error from $ip:54337: remote error: bad certificate
Where $ip is one of the nodes.
So I likely need to delete or recreate certificates. What would the location of those be? Any recommended commands to recreate or remove those or copy them from node to master or vice versa? Whatever gets me past this error message...
Take a look through the Creating Certificates section of authentication.md. It walks you through the certificates that you need to create and how to pass them to the system components, and you should be able to use that to re-generate certificates for your cluster.

Not able to connect to cluster. Facing Certificate signed by unknown authority

I am not sure either what I am trying to do is possible or correct way.
One of my colleague spinup kubernetes gce cluster (with 1 master and 4 minions.) in a project which is shared with me as owner access.
After setup he shared his ~/.kubernetes_auth keys along with .kubecfg.crt, .kubecfg.ca.crt and .kubecfg.key. I copied all of the at my home folder and setup the kubernetes workspace.
I also set the project name as a default project in geconfig. and now I can connect to the master and slaves using 'gcutil ssh --zone us-central1-b kubernetes-master'
But when I try to list of existing pods using 'cluster/kubecfg.sh list pods'
I see
"F1017 21:05:31.037148 18021 kubecfg.go:422] Got request error: Get https://107.178.208.109/api/v1beta1/pods?namespace=default: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "ChangeMe")
I tried to debug from my side but failed to come any conclusion. Any sort of clue will be helpful.
You can also copy the cert files off of the master again. They are located in /usr/share/nginx on the master.
It is probably due to a not implemented feature, see this issue:
https://github.com/GoogleCloudPlatform/kubernetes/issues/1886
you can copy the files from /usr/share/nginx/... on the master
into your home dir and try again.
I figured out a workaround: set the -insecure_skip_tls_verify option
In kubecfg.sh, change the code near the bottom to
else
auth_config=(
"-insecure_skip_tls_verify"
)
fi
Obviously this is insecure and you are putting yourself at risk of a man in the middle attack, etc.