kubernetes | docker | no image found error while rolling update - kubernetes

Have create updated image with new tag for rolling but then while performing update with this command: kubectl set image deployments/hello-node-1 hello-node-1=hello-node:v2
Getting error: kubelet, minikube Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "hello-node-1" with ErrImagePull: "rpc error: code = 2 desc = Error: image library/hello-node not found"

It looks like you didn't set the image correctly. Did you push it to the correct repository? A way to test it could be to create a new deployment that uses your newly created image.

You are referring to the wrong image. The error message shows that the kubelet is attempting to pull hello-node:v2 as an official image from docker hub (library/...).
If you did push your image to docker hub then prefix the image name with your docker hub username.
If this is in some private repository then prefix it with a repository hostname.
If you built the image locally on the node then make sure your imagePullPolicy in your Deployment is set to IfNotPresent and make sure the image is actually present on all nodes this pod might be scheduled to run on.
For minikube check out this post.

Related

401 Unauthorized error while trying to pull image from Google Container Registry

I am using google container registry (GCR) to push and pull docker images. I have created a deployment in kubernetes with 3 replicas. The deployment will use a docker image pulled from the GCR.
Out of 3 replicas, 2 are pulling the images and running fine.But the third replica is showing the below error and the pod's status remains "ImagePullBackOff" or "ErrImagePull"
"Failed to pull image "gcr.io/xxx:yyy": rpc error: code = Unknown desc
= failed to pull and unpack image "gcr.io/xxx:yyy": failed to resolve reference "gcr.io/xxx:yyy": unexpected status code: 401 Unauthorized"
I am confused like why only one of the replicas is showing the error and the other 2 are running without any issue. Can anyone please clarify this?
Thanks in Advance!
ImagePullBackOff and ErrImagePull indicate that the image used by a container cannot be loaded from the image registry.
401 unauthorized error might occur when you pull an image from a private Container Registry repository. For troubleshooting the error:
Identify the node that runs the pod by kubectl describe pod POD_NAME | grep "Node:"
Verify the node has the storage scope by running the command
gcloud compute instances describe NODE_NAME --zone=COMPUTE_ZONE --format="flattened(serviceAccounts[].scopes)"
The node's access scope should contain at least one of the following:
serviceAccounts[0].scopes[0]: https://www.googleapis.com/auth/devstorage.read_only
serviceAccounts[0].scopes[0]: https://www.googleapis.com/auth/cloud-platform
Recreate the node pool that node belongs to with sufficient scope and you cannot modify existing nodes, you must recreate the node with the correct scope.
Create a new node pool with the gke-default scope by the following command
gcloud container node-pools create NODE_POOL_NAME --cluster=CLUSTER_NAME --zone=COMPUTE_ZONE --scopes="gke-default"
Create a new node pool with only storage scope
gcloud container node-pools create NODE_POOL_NAME --cluster=CLUSTER_NAME --zone=COMPUTE_ZONE --scopes="https://www.googleapis.com/auth/devstorage.read_only"
Refer to the link for more information on the troubleshooting process.
Hi you will setup role for cluster to access GCR images for pulling and pushing you can see https://github.com/GoogleContainerTools/skaffold/issues/336

How to use local docker images in kubernetes deployments (NOT minikube)

I have a VM with kubernetes installed using kubeadm (NOT minikube). The VM acts a the single node of the cluster, with taints removed to allow it to act as both Master and Worker node (as shown in the kubernetes documentation).
I have saved, transfered and loaded a my app:test image into it. I can easily run a container with it using docker run.
It shows up when I run sudo docker images.
When I create a deployment/pod that uses this image and specify Image-PullPolicy: IfNotPresent or Never, I still have the ImagePullBackoff error. The describe command shows me it tries to pull the image from dockerhub...
Note that when I try to use a local image that was pulled as the result of creating another pod, the ImagePullPolicies seem to work, no problem. Although the image doesn't appear when i run sudo docker images --all.
How can I use a local image for pods in kubernetes? Is there a way to do it without using a private repository?
image doesn't appear when i run sudo docker images --all
Based on your comment, you are using K8s v1.22, which means it is likely your cluster is using containerd container runtime instead of docker (you can check with kubectl get nodes -o wide, and see the last column).
Try listing your images with crictl images and pulling with crictl pull <image_name> to preload the images on the node.
One can do so with a combination of crictl and ctr, if using containerd.
TLDR: these steps, which are also described in the crictl github documentation:
1- Once you get the image on the node (in my case, a VM), make sure it is in an archive (.tar). You can do that with the docker save or ctr image export commands.
2- Use sudo ctr -n=k8s.io images import myimage.tar while in the same directory as thearchived image to add it to containerd in the namespace that kubernetes uses to track it's images. It should now appear when you run sudo crictl images.
As suggested, I tried listing images with crictl and my app:test did not appear. However, trying to import my local image through crictl didn't seem to work either. I used crictl pull app:test and it showed the following error message:
FATA[0000] pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/app:test": failed to resolve reference "docker.io/library/app:test": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed.
However, when following these steps, my image is finally recognized as an existing local image in kubernetes. They are actually the same as suggested in the crictl github documentation
How does one explain this? How do images get "registered" in the kubernetes cluster? Why couldn't crictl import the image? I might post another issue to ask that...
Your cluster is bottled inside of your VM, so what you call local will always be remote for that cluster in that VM. And the reason that kubernetes is trying to pull those images, is because it can't find them in the VM.
Dockerhub is the default place to download containers from, but you can set kubernetes to pull from aws (ECR) from azure (ACR), from github packages (GCR) and from your own private server.
You've got about 100 ways to solve this, none of them are easy or will just work.
1 - easiest, push your images to Dockerhub and let your cluster pull from it.
2 - setup a local private container registry and set your kubernetes VM to pull from it (see this)
3 - setup a private container registry in your kubernetes cluster and setup scripts in your local env to push to it (see this)

Kubernetes pull image error in non-default namespace

I have a manifest that works fine in a default namespace. The image is cached on my laptop docker registry and I have set the manifest pull policy to IfNotPresent.
Everything is working fine but then when I switch to a non-default namespace, the pod cannot come up due to pull error:
message: Back-off pulling image "myprivaterepo/myapp:latest"
reason: ImagePullBackOff
The image is in my local docker registry with the same path.
Just wonder why it works in a default namespace but in a non-default namespace?
Simple, you need to add the registry in every namespace
look at the docs https://kubernetes.io/docs/concepts/containers/images/
Note: Pods can only reference image pull secrets in their own namespace, so this process needs to be done one time per namespace

Failed to pull image "velero/velero-plugin-for-gcp:v1.1.0" while installing Velero in GKE Cluster

I'm trying to install and configure Velero for kubernetes backup. I have followed the link to configure it in my GKE cluster. The installation went fine, but velero is not working.
I am using google cloud shell for running all my commands (I have installed and configured velero client in my google cloud shell)
On further inspection on velero deployment and velero pods, I found out that it is not able to pull the image from the docker repository.
kubectl get pods -n velero
NAME READY STATUS RESTARTS AGE
velero-5489b955f6-kqb7z 0/1 Init:ErrImagePull 0 20s
Error from velero pod (kubectl describe pod) (output redacted for readability - only relevant info shown below)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 38s default-scheduler Successfully assigned velero/velero-5489b955f6-kqb7z to gke-gke-cluster1-default-pool-a354fba3-8674
Warning Failed 22s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Failed to pull image "velero/velero-plugin-for-gcp:v1.1.0": rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Failed 22s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Error: ErrImagePull
Normal BackOff 21s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Back-off pulling image "velero/velero-plugin-for-gcp:v1.1.0"
Warning Failed 21s kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Error: ImagePullBackOff
Normal Pulling 8s (x2 over 37s) kubelet, gke-gke-cluster1-default-pool-a354fba3-8674 Pulling image "velero/velero-plugin-for-gcp:v1.1.0"
Command used to install velero: (some of the values are given as variables)
velero install \
--provider gcp \
--plugins velero/velero-plugin-for-gcp:v1.1.0 \
--bucket $storagebucket \
--secret-file ~/velero-backup-storage-sa-key.json
Velero Version
velero version
Client:
Version: v1.4.2
Git commit: 56a08a4d695d893f0863f697c2f926e27d70c0c5
<error getting server version: timed out waiting for server status request to be processed>
GKE version
v1.15.12-gke.2
Isn't this a Private Cluster ? – mario 31 mins ago
#mario this is a private cluster but I can deploy other services without any issues (for eg: I have deployed nginx successfully) –
Sreesan 15 mins ago
Well, this is a know limitation of GKE Private Clusters. As you can read in the documentation:
Can't pull image from public Docker Hub
Symptoms
A Pod running in your cluster displays a warning in kubectl describe such as Failed to pull image: rpc error: code = Unknown desc = Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Potential causes
Nodes in a private cluster do not have outbound access to the public
internet. They have limited access to Google APIs and services,
including Container Registry.
Resolution
You cannot fetch images directly from Docker Hub. Instead, use images
hosted on Container Registry. Note that while Container Registry's
Docker Hub
mirror
is accessible from a private cluster, it should not be exclusively
relied upon. The mirror is only a cache, so images are periodically
removed, and a private cluster is not able to fall back to Docker Hub.
You can also compare it with this answer.
It can be easily verified on your own by making a simple experiment. Try to run two different nginx deployments. First based on image nginx (which equals to nginx:latest) and the second one based on nginx:1.14.2.
While the first scenario is perfectly feasible because the nginx:latest image can be pulled from Container Registry's Docker Hub mirror which is accessible from a private cluster, any attempt of pulling nginx:1.14.2 will fail which you'll see in Pod events. It happens because the kubelet is not able to find this version of the image in GCR and it tries to pull it from public docker registry (https://registry-1.docker.io/v2/), which in Private Clusters is not possible. "The mirror is only a cache, so images are periodically removed, and a private cluster is not able to fall back to Docker Hub." - as you can read in docs.
If you still have doubts, just ssh into your node and try to run following commands:
curl https://cloud.google.com/container-registry/
curl https://registry-1.docker.io/v2/
While the first one works perfectly, the second one will eventually fail:
curl: (7) Failed to connect to registry-1.docker.io port 443: Connection timed out
Reason ? - "Nodes in a private cluster do not have outbound access to the public internet."
Solution ?
You can search what is currently available in GCR here.
In many cases you should be able to get the required image if you don't specify it's exact version (by default latest tag is used). While it can help with nginx, unfortunatelly no version of velero/velero-plugin-for-gcp is currently available in Google Container Registry's Docker Hub mirror.
Granting private nodes outbound internet access by using Cloud NAT seems the only reasonable solution that can be applied in your case.
I solved this problem by realizing that version of:
velero/velero-plugin-for-gcp
is not following the version of:
velero/velero
For example, now latest versions are:
velero/velero:v1.9.1 and velero/velero-plugin-for-gcp:v1.5.0

Kubeflow: Image Pull --> no space left on device

Is there any way to clean all cached docker images etc from a kubernetes setup that could free up space on the master nodes?
I try to install a deployment but the kubernetes prompts “no space left on device” while image pulling.
I am kind of surprised that a 80GB disk is not enough for one simple deployment because the cluster is now completely emptied.
Does anyone has an idea on how to wipe all unused docker image etc out?
Thanks a lot!
Successfully pulled image "tensorflow/serving:1.11.1"
Warning Failed 4m30s kubelet, 192.168.10.37 Failed to pull image "gcr.io/kubeflow-images-public/tf-model-server-http-proxy:v20180606-9dfda4f2": rpc error: code = Unknown desc = failed to register layer: Error processing tar file(exit status 1): write /usr/lib/python3.5/idlelib/__pycache__/CodeContext.cpython-35.pyc: no space left on device
Warning Failed 4m27s (x3 over 4m29s) kubelet, 192.168.10.37 Error: ImagePullBackOff
You can run docker image prune to clean up unused images or docker system prune
to cleanup all docker unused resources.
Also you can configure Garbage Collection feature of Kubernetes