Kubernetes image from local registry - kubernetes

I want to deploy a pod using a Docker image which has been pushed to a private registry.
So far, I've used the following command to install the registry and push the image:
# Build the DockerImage file
DOCKER_IMAGE="truc/tf-http-server:0.1"
cd docker
docker build -t $DOCKER_IMAGE .
cd ..
# Install Registry V2
docker run -d -p 5000:5000 --restart=always --name registry registry:2
# Push image
docker tag $DOCKER_IMAGE localhost:5000/$DOCKER_IMAGE
docker push localhost:5000/$DOCKER_IMAGE
# Add to known repository
sudo bash -c 'cat << EOF > /etc/docker/daemon.json
{
"insecure-registries" : [ "192.168.1.37:5000" ]
}
EOF'
sudo systemctl daemon-reload
sudo systemctl restart docker
Pulling the image works directly from Docker:
$ sudo docker pull 192.168.1.37:5000/truc/tf-http-server:0.1
0.1: Pulling from truc/tf-http-server
Digest: sha256:b09c10375f1e90346f9b0c4bfb2bdfc7df919a4c89aaebfb433f2d845b37a960
Status: Downloaded newer image for 192.168.1.37:5000/truc/tf-http-server:0.1
192.168.1.37:5000/truc/tf-http-server:0.1
When I want to deploy the image from Kubernetes, I got the following error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 29s default-scheduler Successfully assigned default/tf-http-server-nvl9v to worker01
Normal Pulling 16s (x2 over 29s) kubelet Pulling image "192.168.1.37:5000/truc/tf-http-server:0.1"
Warning Failed 16s (x2 over 29s) kubelet Failed to pull image "192.168.1.37:5000/truc/tf-http-server:0.1": rpc error: code = Unknown desc = failed to pull and unpack image "192.168.1.37:5000/truc/tf-http-server:0.1": failed to resolve reference "192.168.1.37:5000/truc/tf-http-server:0.1": failed to do request: Head "https://192.168.1.37:5000/v2/truc/tf-http-server/manifests/0.1": http: server gave HTTP response to HTTPS client
Warning Failed 16s (x2 over 29s) kubelet Error: ErrImagePull
Normal BackOff 3s (x2 over 28s) kubelet Back-off pulling image "192.168.1.37:5000/truc/tf-http-server:0.1"
Warning Failed 3s (x2 over 28s) kubelet Error: ImagePullBackOff
It seems like if the repository access was forbidden. Is there a way to make it reachable from Kubernetes ?
EDIT: To install Docker registy, run the following commands and follow the checked answer.
mkdir registry && cd registry && mkdir certs && cd certs
openssl genrsa 1024 > domain.key
chmod 400 domain.key
openssl req -new -x509 -nodes -sha1 -days 365 -key domain.key -out domain.crt -subj "/C=FR/ST=France/L=Lannion/O=TGI/CN=OrangeFactoryBox"
cd .. && mkdir auth
sudo apt-get install apache2-utils -y
htpasswd -Bbn username password > auth/htpasswd
cd ..
docker run -d \
--restart=always \
--name registry \
-v `pwd`/auth:/auth \
-v `pwd`/certs:/certs \
-v `pwd`/certs:/certs \
-e REGISTRY_AUTH=htpasswd \
-e REGISTRY_AUTH_HTPASSWD_REALM="Registry Realm" \
-e REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd \
-e REGISTRY_HTTP_ADDR=0.0.0.0:5000 \
-e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt \
-e REGISTRY_HTTP_TLS_KEY=/certs/domain.key \
-p 5000:5000 \
registry:2
sudo docker login -u username -p password localhost:5000

Assumption: The docker server where you tested it and the kubernetes
nodes are on the same private subnet 192.168.1.0/24
http: server gave HTTP response to HTTPS client
So, apparently your private docker registry is HTTP not HTTPS. Kuberentes prefers the registry to use a valid SSL certificate. On each node in your kubernetes cluster, you will need to explicitly tell the docker to treat this registry as an insecure registry. Following this change you will have to restart the docker service as well.
Kubernetes: Failed to pull image. Server gave HTTP response to HTTPS client.
{ "insecure-registries":["192.168.1.37:5000"] }
to the daemon.json file at /etc/docker.
You will also need to define the imagePullSecrets in your namespace and use it in your deployment/pod spec
First create the secret from your <path/to/.docker/config.json> using:
kubectl create secret generic regcred \
--from-file=.dockerconfigjson=<path/to/.docker/config.json> \
--type=kubernetes.io/dockerconfigjson
Then refer to this secret in your pod yaml
apiVersion: v1
kind: Pod
metadata:
name: private-reg
spec:
containers:
- name: private-reg-container
image: <your-private-image>
imagePullSecrets:
- name: regcred
Reference: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/

Related

Unable to connect internet/google.com from pod. Docker and k8 are able to pull images

I am trying to learn Kubernetes.
Create a single-node Kubernetes Cluster on Oracle Cloud using these steps here
cat /etc/resolv.conf
>> nameserver 169.254.169.254
kubectl run busybox --rm -it --image=busybox --restart=Never -- sh
cat /etc/resolv.conf
>> nameserver 10.33.0.10
nslookup google.com
>>Server: 10.33.0.10
Address: 10.33.0.10:53
;; connection timed out; no servers could be reached
ping 10.33.0.10
>>PING 10.33.0.10 (10.33.0.10): 56 data bytes
kubectl get svc -n kube-system -o wide
>> CLUSTER-IP - 10.33.0.10
kubectl logs --namespace=kube-system -l k8s-app=kube-dns
>>[ERROR] plugin/errors: 2 google.com. A: read udp 10.32.0.9:57385->169.254.169.254:53: i/o timeout
Not able to identify if this is an error of coredns or pod networking. Any direction would really help
Kubernetes has deprecated Docker as a container runtime after v1.20.
Kubernetes Development decision to deprecate Docker as an underlying runtime in favor of runtimes that use the Container Runtime Interface (CRI) created for Kubernetes.
To support this Mirantis and Docker came to the rescue by agreeing to partner in the maintenance of the shim code standalone.
More details here here
sudo systemctl enable docker
# -- Installin cri-dockerd
VER=$(curl -s https://api.github.com/repos/Mirantis/cri-dockerd/releases/latest|grep tag_name | cut -d '"' -f 4)
echo $VER
wget https://github.com/Mirantis/cri-dockerd/releases/download/${VER}/cri-dockerd-${VER}-linux-arm64.tar.gz
tar xvf cri-dockerd-${VER}-linux-arm64.tar.gz
install -o root -g root -m 0755 cri-dockerd /usr/bin/cri-dockerd
cp cri-dockerd /usr/bin/
# -- Verification
cri-dockerd --version
# -- Configure systemd units for cri-dockerd
wget https://raw.githubusercontent.com/Mirantis/cri-dockerd/master/packaging/systemd/cri-docker.service
wget https://raw.githubusercontent.com/Mirantis/cri-dockerd/master/packaging/systemd/cri-docker.socket
sudo cp cri-docker.socket cri-docker.service /etc/systemd/system/
sudo cp cri-docker.socket cri-docker.service /usr/lib/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable cri-docker.service
sudo systemctl enable --now cri-docker.socket
# -- Using cri-dockerd on new Kubernetes cluster
systemctl status docker | grep Active
I ran into similar issue with almost same scenario described above. The accepted solution https://stackoverflow.com/a/72104194/1119570 is wrong. This issue is a pure networking issue that is not related to any of EKS upgrade in any way.
The root cause for our issue was the fact that the Worker Node AWS EKS Linux 1.21 AMI being hardened by our security department which turns off the following setting in this file /etc/sysctl.conf:
net.ipv4.ip_forward = 0
After switching this setting to:
net.ipv4.ip_forward = 1 and rebooting the EC2 Node, everything started working properly. Hope this helps!

rancher rke up errors on etcd host health checks remote error: tls: bad certificate

rke --debug up --config cluster.yml
fails with health checks on etcd hosts with error:
DEBU[0281] [etcd] failed to check health for etcd host [x.x.x.x]: failed to get /health for host [x.x.x.x]: Get "https://x.x.x.x:2379/health": remote error: tls: bad certificate
Checking etcd healthchecks
for endpoint in $(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f5"); do
echo "Validating connection to ${endpoint}/health";
curl -w "\n" --cacert $(docker exec etcd printenv ETCDCTL_CACERT) --cert $(docker exec etcd printenv ETCDCTL_CERT) --key $(docker exec etcd printenv ETCDCTL_KEY) "${endpoint}/health";
done
Running on that master node
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}
Validating connection to https://x.x.x.x:2379/health
{"health":"true"}
you can run it manually and see if it responds correctly
curl -w "\n" --cacert /etc/kubernetes/ssl/kube-ca.pem --cert /etc/kubernetes/ssl/kube-etcd-x-x-x-x.pem --key /etc/kubernetes/ssl/kube-etcd-x-x-x-x-key.pem https://x.x.x.x:2379/health
Checking my self signed certificates hashes
# md5sum /etc/kubernetes/ssl/kube-ca.pem
f5b358e771f8ae8495c703d09578eb3b /etc/kubernetes/ssl/kube-ca.pem
# for key in $(cat /home/kube/cluster.rkestate | jq -r '.desiredState.certificatesBundle | keys[]'); do echo $(cat /home/kube/cluster.rkestate | jq -r --arg key $key '.desiredState.certificatesBundle[$key].certificatePEM' | sed '$ d' | md5sum) $key; done | grep kube-ca
f5b358e771f8ae8495c703d09578eb3b - kube-ca
versions on my master node
Debian GNU/Linux 10
rke version v1.3.1
docker version Version: 20.10.8
kubectl v1.21.5
v1.21.5-rancher1-1
I think my cluster.rkestate gone bad, are there any other locations where rke tool checks for certificates?
Currently I cannot do anything with this production cluster, and want to avoid downtime. I experimented on testing cluster different scenarios, I could do as last resort to recreate the cluster from scratch, but maybe I can still fix it...
rke remove && rke up
rke util get-state-file helped me to reconstruct bad cluster.rkestate file
and I was able to successfully rke up and add new master node to fix whole situation.
The problem can be solved by doing the following steps:
Remove kube_config_cluster.yml file where you run rke up command. (Since some data are missing in your K8s nodes)
Remove cluster.rkestate file.
Re-run rke up command.

Rancher Server v2.x expired certificates

My certificates for rancher server expired and now I can not log in to UI anymore to manage my k8s clusters.
Error:
2021-05-26 00:57:52.437334 I | http: TLS handshake error from 127.0.0.1:43238: remote error: tls: bad certificate
2021/05/26 00:57:52 [INFO] Waiting for server to become available: Get https://127.0.0.1:6443/version?timeout=30s: x509: certificate has expired or is not yet valid
So what I did was rolling back the date on the RancherOS machine that is running Rancher Server container. After that I restarted the container and it refreshed the certificates. I checked with:
for i in `ls /var/lib/rancher/k3s/server/tls/*.crt`; do echo $i; openssl x509 -enddate -noout -in $i; done
Since now I was able to log into the UI I forced a certificate rotation on the k8s cluster.
But I still get the same error once the date is reset to current and I can not log in to the Rancher Server UI.
What am I missing here?
This was the missing piece: https://github.com/rancher/rancher/issues/26984#issuecomment-818770519
Deleting the dynamic-cert.json and running kubectl delete secret
I recently had to swap this again and this is how I do it now:
sudo docker exec -it <container_id> sh -c "rm /var/lib/rancher/k3s/server/tls/dynamic-cert.json" && \
sudo docker exec -it <container_id> k3s kubectl --insecure-skip-tls-verify=true delete secret -n kube-system k3s-serving && \
sudo docker restart <container_id>

Error installing TimescaleDB with K8S / Helm : MountVolume.SetUp failed for volume "certificate" : secret "timescaledb-certificate" not found

I just tried to install timescaleDB Single with Helm in minikube on Ubuntu 20.04.
After installing via:
helm install timescaledb timescaledb/timescaledb-single --namespace espace-client-v2
I got the message:
➜ ~ helm install timescaledb timescaledb/timescaledb-single --namespace espace-client-v2
NAME: timescaledb
LAST DEPLOYED: Fri Aug 7 17:17:59 2020
NAMESPACE: espace-client-v2
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
TimescaleDB can be accessed via port 5432 on the following DNS name from within your cluster:
timescaledb.espace-client-v2.svc.cluster.local
To get your password for superuser run:
# superuser password
PGPASSWORD_POSTGRES=$(kubectl get secret --namespace espace-client-v2 timescaledb-credentials -o jsonpath="{.data.PATRONI_SUPERUSER_PASSWORD}" | base64 --decode)
# admin password
PGPASSWORD_ADMIN=$(kubectl get secret --namespace espace-client-v2 timescaledb-credentials -o jsonpath="{.data.PATRONI_admin_PASSWORD}" | base64 --decode)
To connect to your database, chose one of these options:
1. Run a postgres pod and connect using the psql cli:
# login as superuser
kubectl run -i --tty --rm psql --image=postgres \
--env "PGPASSWORD=$PGPASSWORD_POSTGRES" \
--command -- psql -U postgres \
-h timescaledb.espace-client-v2.svc.cluster.local postgres
# login as admin
kubectl run -i --tty --rm psql --image=postgres \
--env "PGPASSWORD=$PGPASSWORD_ADMIN" \
--command -- psql -U admin \
-h timescaledb.espace-client-v2.svc.cluster.local postgres
2. Directly execute a psql session on the master node
MASTERPOD="$(kubectl get pod -o name --namespace espace-client-v2 -l release=timescaledb,role=master)"
kubectl exec -i --tty --namespace espace-client-v2 ${MASTERPOD} -- psql -U postgres
It seemed to have installed well.
But then, when executing:
PGPASSWORD_POSTGRES=$(kubectl get secret --namespace espace-client-v2 timescaledb-credentials -o jsonpath="{.data.PATRONI_SUPERUSER_PASSWORD}" | base64 --decode)
Error from server (NotFound): secrets "timescaledb-credentials" not found
After that, I realized pod has not even been created, and it gives me the following errors
MountVolume.SetUp failed for volume "certificate" : secret "timescaledb-certificate" not found
Unable to attach or mount volumes: unmounted volumes=[certificate], unattached volumes=[storage-volume wal-volume patroni-config timescaledb-scripts certificate socket-directory timescaledb-token-svqqf]: timed out waiting for the condition
What should I do ?
I could do it. If the page https://github.com/timescale/timescaledb-kubernetes doesn't give much details about installation process, you can go here:
https://github.com/timescale/timescaledb-kubernetes/tree/master/charts/timescaledb-single
I had to use kustomize to generate content:
./generate_kustomization.sh my-release
and then it generate several files:
credentials.conf kustomization.yaml pgbackrest.conf timescaledbMap.yaml tls.crt tls.key
then I did:
kubectl kustomize ./
which generated a k8s yml file, which I saved with the name timescaledbMap.yaml
Finally, I did:
kubectl apply -f timescaledbMap.yaml
Then it created all necesarry secrets, and I could install chart
. Hope it helps others.

Failed to setup a HA etcd cluster

I want to setup an etcd cluster runnin on multiple nodes. I have running 2 unbuntu 18.04 machines running on a Hyper-V terminal.
I followed this guide on the official kubernetes site:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/
Therefore, I changed the several scripts and executed this scripts on HOST0 and HOST1
export HOST0=192.168.101.90
export HOST1=192.168.101.91
mkdir -p /tmp/${HOST0}/ /tmp/${HOST1}/
ETCDHOSTS=(${HOST0} ${HOST1} ${HOST2})
NAMES=("infra0" "infra1")
for i in "${!ETCDHOSTS[#]}"; do
HOST=${ETCDHOSTS[$i]}
NAME=${NAMES[$i]}
cat << EOF > /tmp/${HOST}/kubeadmcfg.yaml
apiVersion: "kubeadm.k8s.io/v1beta2"
kind: ClusterConfiguration
etcd:
local:
serverCertSANs:
- "${HOST}"
peerCertSANs:
- "${HOST}"
extraArgs:
initial-cluster: ${NAMES[0]}=https://${ETCDHOSTS[0]}:2380,${NAMES[1]}=https://${ETCDHOSTS[1]}:2380
initial-cluster-state: new
name: ${NAME}
listen-peer-urls: https://${HOST}:2380
listen-client-urls: https://${HOST}:2379
advertise-client-urls: https://${HOST}:2379
initial-advertise-peer-urls: https://${HOST}:2380
EOF
done
After that, I executed this command on HOST0
kubeadm init phase certs etcd-ca
I created all the nessecary on HOST0
# cleanup non-reusable certificates
find /etc/kubernetes/pki -not -name ca.crt -not -name ca.k
kubeadm init phase certs etcd-peer --config=/tmp/${HOST1}/kubeadmcfg.yaml
kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST1}/kubeadmcfg.yaml
kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST1}/kubeadmcfg.yaml
cp -R /etc/kubernetes/pki /tmp/${HOST1}/
find /etc/kubernetes/pki -not -name ca.crt -not -name ca.key -type f -delete
kubeadm init phase certs etcd-server --config=/tmp/${HOST0}/kubeadmcfg.yaml
kubeadm init phase certs etcd-peer --config=/tmp/${HOST0}/kubeadmcfg.yaml
kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST0}/kubeadmcfg.yaml
kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST0}/kubeadmcfg.yaml
# No need to move the certs because they are for HOST0
# clean up certs that should not be copied off this host
find /tmp/${HOST1} -name ca.key -type f -delete
After that, I copied the files to the second ETCTD node (HOST1). Before that I created a root user mbesystem
USER=mbesystem
HOST=${HOST1}
scp -r /tmp/${HOST}/* ${USER}#${HOST}:
ssh ${USER}#${HOST}
USER#HOST $ sudo -Es
root#HOST $ chown -R root:root pki
root#HOST $ mv pki /etc/kubernetes/
I'll check all the files were there on HOST0 and HOST1.
On HOST0 I started the etcd cluster using:
kubeadm init phase etcd local --config=/tmp/192.168.101.90/kubeadmcfg.yaml
On Host1 I started using:
kubeadm init phase etcd local --config=/home/mbesystem/kubeadmcfg.yaml
After I executed:
docker run --rm -it \
--net host \
-v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:3.4.3-0 etcdctl \
--cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--endpoints https://192.168.101.90:2379 endpoint health --cluster
I discovered my cluster is not healty, I'll received a connection refused.
I can't figure out what went wrong. Any help will be appreciated.
I've looked into it, reproduced what was in the link that you provided: Kubernetes.io: Setup ha etcd with kubeadm and managed to make it work.
Here is some explanation/troubleshooting steps/tips etc.
First of all etcd should be configured with odd number of nodes. What I mean by that is it should be created as 3 or 5 nodes cluster.
Why an odd number of cluster members?
An etcd cluster needs a majority of nodes, a quorum, to agree on updates to the cluster state. For a cluster with n members, quorum is (n/2)+1. For any odd-sized cluster, adding one node will always increase the number of nodes necessary for quorum. Although adding a node to an odd-sized cluster appears better since there are more machines, the fault tolerance is worse since exactly the same number of nodes may fail without losing quorum but there are more nodes that can fail. If the cluster is in a state where it can't tolerate any more failures, adding a node before removing nodes is dangerous because if the new node fails to register with the cluster (e.g., the address is misconfigured), quorum will be permanently lost.
-- Github.com: etcd documentation
Additionally here are some troubleshooting steps:
Check if docker is running You can check it by running command (on systemd installed os):
$ systemctl show --property ActiveState docker
Check if etcd container is running properly with:
$ sudo docker ps
Check logs of etcd container if it's running with:
$ sudo docker logs ID_OF_CONTAINER
How I've managed to make it work:
Assuming 2 Ubuntu 18.04 servers with IP addresses of:
10.156.0.15 and name: etcd-1
10.156.0.16 and name: etcd-2
Additionally:
SSH keys configured for root access
DNS resolution working for both of the machines ($ ping etcd-1)
Steps:
Pre-configuration before the official guide.
I did all of the below configuration with the usage of root account
Configure the kubelet to be a service manager for etcd.
Create configuration files for kubeadm.
Generate the certificate authority.
Create certificates for each member
Copy certificates and kubeadm configs.
Create the static pod manifests.
Check the cluster health.
Pre-configuration before the official guide
Pre-configuration of this machines was done with this StackOverflow post with Ansible playbooks:
Stackoverflow.com: 3 kubernetes clusters 1 base on local machine
You can also follow official documentation: Kubernetes.io: Install kubeadm
Configure the kubelet to be a service manager for etcd.
Run below commands on etcd-1 and etcd-2 with root account.
cat << EOF > /etc/systemd/system/kubelet.service.d/20-etcd-service-manager.conf
[Service]
ExecStart=
# Replace "systemd" with the cgroup driver of your container runtime. The default value in the kubelet is "cgroupfs".
ExecStart=/usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd
Restart=always
EOF
$ systemctl daemon-reload
$ systemctl restart kubelet
Create configuration files for kubeadm.
Create your configuration file on your etcd-1 node.
Here is modified script that will create kubeadmcfg.yaml for only 2 nodes:
export HOST0=10.156.0.15
export HOST1=10.156.0.16
# Create temp directories to store files that will end up on other hosts.
mkdir -p /tmp/${HOST0}/ /tmp/${HOST1}/
ETCDHOSTS=(${HOST0} ${HOST1})
NAMES=("etcd-1" "etcd-2")
for i in "${!ETCDHOSTS[#]}"; do
HOST=${ETCDHOSTS[$i]}
NAME=${NAMES[$i]}
cat << EOF > /tmp/${HOST}/kubeadmcfg.yaml
apiVersion: "kubeadm.k8s.io/v1beta2"
kind: ClusterConfiguration
etcd:
local:
serverCertSANs:
- "${HOST}"
peerCertSANs:
- "${HOST}"
extraArgs:
initial-cluster: ${NAMES[0]}=https://${ETCDHOSTS[0]}:2380,${NAMES[1]}=https://${ETCDHOSTS[1]}:2380
initial-cluster-state: new
name: ${NAME}
listen-peer-urls: https://${HOST}:2380
listen-client-urls: https://${HOST}:2379
advertise-client-urls: https://${HOST}:2379
initial-advertise-peer-urls: https://${HOST}:2380
EOF
done
Take a special look on:
export HOSTX on the top of the script. Paste the IP addresses of your machines there.
NAMES=("etcd-1" "etcd-2"). Paste the names of your machines (hostname) there.
Run this script from root account and check if it created files in /tmp/IP_ADDRESS directory.
Generate the certificate authority
Run below command from root account on your etcd-1 node:
$ kubeadm init phase certs etcd-ca
Create certificates for each member
Below is a part of the script which is responsible for creating certificates for each member of etcd cluster. Please modify the HOST0 and HOST1 variables.
#!/bin/bash
HOST0=10.156.0.15
HOST1=10.156.0.16
kubeadm init phase certs etcd-server --config=/tmp/${HOST1}/kubeadmcfg.yaml
kubeadm init phase certs etcd-peer --config=/tmp/${HOST1}/kubeadmcfg.yaml
kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST1}/kubeadmcfg.yaml
kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST1}/kubeadmcfg.yaml
cp -R /etc/kubernetes/pki /tmp/${HOST1}/
find /etc/kubernetes/pki -not -name ca.crt -not -name ca.key -type f -delete
kubeadm init phase certs etcd-server --config=/tmp/${HOST0}/kubeadmcfg.yaml
kubeadm init phase certs etcd-peer --config=/tmp/${HOST0}/kubeadmcfg.yaml
kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST0}/kubeadmcfg.yaml
kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST0}/kubeadmcfg.yaml
# No need to move the certs because they are for HOST0
Run above script from root account and check if there is pki directory inside /tmp/10.156.0.16/.
There shouldn't be any pki directory inside /tmp/10.156.0.15/ as it's already in place.
Copy certificates and kubeadm configs.
Copy your kubeadmcfg.yaml of etcd-1 from /tmp/10.156.0.15 to root directory with:
$ mv /tmp/10.156.0.15/kubeadmcfg.yaml /root/
Copy the content of /tmp/10.156.0.16 from your etcd-1 to your etcd-2 node to /root/ directory:
$ scp -r /tmp/10.156.0.16/* root#10.156.0.16:
After that check if files copied correctly, have correct permissions and copy pki folder to /etc/kubernetes/ with command on etcd-2:
$ mv /root/pki /etc/kubernetes/
Create the static pod manifests.
Run below command on etcd-1 and etcd-2:
$ kubeadm init phase etcd local --config=/root/kubeadmcfg.yaml
All should be running now.
Check the cluster health.
Run below command to check cluster health on etcd-1.
docker run --rm -it --net host -v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:3.4.3-0 etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt --endpoints https://10.156.0.15:2379 endpoint health --cluster
Modify:
--endpoints https://10.156.0.15:2379 with correct IP address of etcd-1
It should give you a message like this:
https://10.156.0.15:2379 is healthy: successfully committed proposal: took = 26.308693ms
https://10.156.0.16:2379 is healthy: successfully committed proposal: took = 26.614373ms
Above message concludes that the etcd is working correctly but please be aware of an even number of nodes.
Please let me know if you have any questions to that.