Pull image from private docker registry in AWS EKS Autoscaler worker nodes - kubernetes

I'm using AWS EKS with Auto Scaler for the worker nodes. I've private Artifactory docker registry.
Now in order to download docker images from private registry, I've read many documents including kubernetes docs for - how to pull docker image from private docker registry.
There are three steps in the solution:
Create kubectl secret which contains docker registry credentials
Add "insecure-registries":["privateRegistryAddress:port"] in /etc/docker/daemon.json
Restart docker service
I've manually SSH into worker nodes and ran 2nd and 3rd step which works for temporary but as EKS Auto Scaler finds if that worker nodes is not in use then kill it and create new one as needed, where in this new worker node "insecure-registries":["privateRegistryAddress:port"] in /etc/docker/daemon.json is not added, and due to which pod scheduling fails.
There are two solutions I can think of here -
Configure AWS EC2 AMI which contains "insecure-registries":["privateRegistryAddress:port"] in /etc/docker/daemon.json default and use that image in auto scaler configuration
Create pod which has node level permission to edit the mentioned file and restart docker service - but I doubt if docker service restarted then that pod itself would go down and if that works or not
Please advise. Thanks.

Solved this from first approach I mentioned in question.
First of course created kubectl secret to login to private registry
SSHed into kubernetes worker nodes and added ["privateRegistryAddress:port"] in /etc/docker/daemon.json
Created AMI image out of that node
Updated EC2 launch template with the new AMI and set new template version as default
Updated Ec2 Auto scaling group with new launch template version
Killed previous worker nodes and let auto scaling group created new nodes
and voila!! :)
Now whenever EKS using Auto Scaling group increase/decrease EC2 instances, they will be able to download docker images from private docker registry.

Related

Insecure registry containerd in Kubernetes

everyone.
I try to build k8s cluster (v 1.20) in my company.
I have some worker-node.
And I have been deployed some services already.
And now I have a task to create a custom image. The main question: where should I store them (images)? I made local docker registry. But containerd can't pull image from not secure registry.
And I find decision in net: update config.toml for containerd where I can write endpoint for docker registry with "http". To my mind it is not comfortable to add section for every new node in k8s-cluster.
Could you help me find more elegant decision?

Migrating from the Docker runtime to the Containerd runtime on a specific NODE kubernetes

I would like to know if it's possible to change the param --image-type COS_CONTAINERD in the google cloud command line but on a specific node.
I found this command below :
gcloud container clusters upgrade CLUSTER_NAME --image-type COS_CONTAINERD [--node-pool POOL_NAME]
But it will update my entire node pool so it'will create a disturbance on my services web .
Every node in my pool will be updated in the same time ( destroy & re-create ).
Or there is another way ???
I would like to know if you have the same topic / issue . It' will help me during this migration.
Thanks for your experience .
The command you are mentioning, will update all the nodes within the node pool.
I would suggest you trying to create a new node pool on your cluster and choose the COS_CONTAINERD image type, and once you have the new node pool with cos-containerd migrate your workload to the new node pool node(s) by following this process. This will also help you to manage the downtime.

Kubernetes V1.16.8 doesn't support 'node-role' label using "--node-labels=node-role.kubernetes.io/master="

Upgrade Kube-aws v1.15.5 cluster to the next version 1.16.8.
Use Case:
I want to keep the Same node label for Master and Worker nodes as I'm using in v1.15 .
When I tried to upgrade the cluster to V1.16 the --node-labels is restricted to use 'node-role'
If I keep the node role as "node-role.kubernetes.io/master" the kubelet fails to start after upgrade. if I remove the label, kubectl get node output shows none for the upgraded node.
How do I reproduce?
Before the upgrade I took a backup of 'cp /etc/sysconfig/kubelet /etc/sysconfig/kubelet-bkup' have removed "-role" from it and once the upgrade is completed, I have moved the kubelet sysconfig by replacing the edited file 'mv /etc/sysconfig/kubelet-bkup /etc/sysconfig/kubelet'. Now I could able to see the Noderole as Master/Worker even after kubelet service restart.
The Problem I'm facing now?
Though I perform the upgrade on the existing cluster successfully. The cluster is running in AWS as Kube-aws model. So, the ASG would spin up a new node whenever Cluster-Autoscaler triggers it.
But, the new node fails to join to the cluster since the node label "node-role.kubernetes.io/master" exists in the code base.
How can I add the node-role dynamically in the ASG scale-in process?. Any solution would be appreciated.
Note:
(Kubeadm, kubelet, kubectl )- v1.16.8
I have sorted out the issue. I have created a Python code that watches the node events. So whenever ASG spins up a new node, after it joins to the cluster, the node wil be having a role "" , later the python code will add a appropriate label to the node dynamically.
Also, I have created a docker image with the base of python script I created for node-label and it will run as a pod. The pod will be deployed into the cluster and it does the job of labelling the new nodes.
Ref my solution given in GitHub
https://github.com/kubernetes/kubernetes/issues/91664
I have created as a docker image and it is publicly available
https://hub.docker.com/r/shaikjaffer/node-watcher
Thanks,
Jaffer

GKE - Upgrading cluster master after cluster creation completes

Once we increase load by using JMeter client than my deployed service is interrupted and on GCP/GKE console it says that -
Upgrading cluster master
The values shown below are going to change soon.
And my kubectl client throw this error during upgrade -
Unable to connect to the server: dial tcp 35.236.238.66:443: connectex: No connection could be made because the target machine actively refused it.
How can I stop this upgrade or prevent my service interruption ? If service will be intrupted than there is no benefit of this auto scaling. I am new to GKE, please let me know if I am missing any configuration or parameter here.
I am using this command to create my cluster-
gcloud container clusters create ajeet-gke --zone us-east4-b --node-locations us-east4-b --machine-type n1-standard-8 --num-nodes 1 --enable-autoscaling --min-nodes 4 --max-nodes 16
It is not upgrading k8s version. Because it works fine with smaller load but as I increase load than cluster starts upgrade of master. So it looks the master is resizing itself for more nodes. After upgrade I can see more nodes on GCP console. https://github.com/terraform-providers/terraform-provider-google/issues/3385
Below command says auto scaling is not enabled on instance group.
> gcloud compute instance-groups managed list
NAME AUTOSCALED LOCATION SCOPE ---
ajeet-gke-cluster- no us-east4-b zone ---
default-pool-4***0
Workaround
Sorry forget to update it here, I found a workaround to fix it - after splitting cluster creation command in to two steps cluster is auto scaling without restarting master node:
gcloud container clusters create ajeet-ggs --zone us-east4-b --node-locations us-east4-b --machine-type n1-standard-8 --num-nodes 1
gcloud container clusters update ajeet-ggs --enable-autoscaling --min-nodes 1 --max-nodes 10 --zone us-east4-b --node-pool default-pool
To prevent this you should always create your cluster with hardcoded cluster version to the last version available.
See the documentation: https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#master
This means that Goolge is managing the master, meaning that if your master is not up to date it will be updated to be in the last version and allow google to limit the number of version currently managed. https://cloud.google.com/kubernetes-engine/docs/concepts/regional-clusters
Now why do you have an interruption of service during the update: because you are in zonal mode with only one master, to prevent this you should go in regional cluster mode with more than one master, allowing for clean rolling update.
The master won't resize the node, unless the autoscaling feature is enabled in it.
As mentioned in above answer, this is a feature at the node-pool level. By looking at description of the issue, it does seems like 'autoscaling' is enabled on your node-pool and eventually a GKE's cluster autoscaler automatically resizes clusters based on the demands of the workloads you want to run(ie when there are pods that are not able to be scheduled due to resource shortages such as CPU).
Additionaly, Kubernetes cluster autoscaling does not use the Managed Instance Group autoscaler. It runs a cluster-autoscaler controller on the Kubernetes master that uses Kubernetes-specific signals to scale your nodes.
It is therefore, highly recommended not use(or rely on the autoscaling status showed by MIG) Compute Engine's autoscaling feature on instance groups created by Kubernetes Engine.

How do I update a service in the cluster to use a new docker image

I have created a new docker image that I want to use to replace the current docker image. The application is on the kubernetes engine on google cloud platform.
I believe I am supposed to use the gcloud container clusters update command. Although, I struggle to see how it works and how I'm supposed to replace the old docker image with the new one.
You may want to use kubectl in order to interact with your GKE cluster. Method of image update depends on how the Pod / Container was created.
For some example commands, see https://kubernetes.io/docs/reference/kubectl/cheatsheet/#updating-resources
For example, kubectl set image deployment/frontend www=image:v2 will do a rolling update "www" containers of "frontend" deployment, updating the image.
Getting up and running on GKE: https://cloud.google.com/kubernetes-engine/docs/quickstart
You can use Container Registry[1] as a single place to manage Docker images.
Google Container Registry provides secure, private Docker repository storage on Google Cloud Platform. You can use gcloud to push[2] images to your registry, then you can pull images using an HTTP endpoint from any machine.
You can also use Docker Hub repositories[3] allow you share container images with your team, customers, or the Docker community at large.
[1]https://cloud.google.com/container-registry/
[2]https://cloud.google.com/container-registry/docs/pushing-and-pulling
[3]https://docs.docker.com/docker-hub/repos/