Kubernetes minikube - can pull from docker registry manually, but rolling deployments won't pull - kubernetes

I have a Kubernetes minikube running a deployment / service.
When I try to update the image to a new version (from my registry on a separate machine) as follows:
kubectl set image deployment/flask-deployment-yaml flask-api-
endpoint=192.168.1.201:5000/test_flask:2
It fails with the errors:
Failed to pull image "192.168.1.201:5000/test_flask:2": rpc error:
code = 2 desc = Error: image test_flask:2 not found
If I log on to my minikube server and manually pull the docker image as follows:
$ docker pull 192.168.1.201:5000/test_flask:2
2: Pulling from test_flask
280aca6ddce2: Already exists
3c0df3e97827: Already exists
669c8479e3f7: Pull complete
83323a067779: Pull complete
Digest: sha256:0f9650465284215d48ad0efe06dc888c50928b923ecc982a1b3d6fa38d
Status: Downloaded newer image for 192.168.1.201:5000/test_flask:2
It works, and then my deployment update suddently succeeds, presumably because the image now exists locally.
I'm not sure why the deployment update doesn't just work straight away...
More deployment details:
Name: flask-deployment-yaml
Namespace: default
CreationTimestamp: Sat, 07 Oct 2017 15:57:24 +0100
Labels: app=front-end
Annotations: deployment.kubernetes.io/revision=2
Selector: app=front-end
Replicas: 4 desired | 4 updated | 4 total | 4 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: app=front-end
Containers:
flask-api-endpoint:
Image: 192.168.1.201:5000/test_flask:2
Port: 5000/TCP
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: flask-deployment-yaml-1174202895 (4/4 replicas created)

You should either delete your minikube cluster and start it again with the --insecure-registry flag, to allow pulling from insecure registries, or use one that is reachable through localhost and port forward into the minikube cluster, as it won't refuse to pull from localhost. More details here:
- https://github.com/kubernetes/minikube/blob/master/docs/insecure_registry.md
- https://github.com/kubernetes/minikube/issues/604
And more details and illustrations on the problem and how to fix here: https://blog.hasura.io/sharing-a-local-registry-for-minikube-37c7240d0615

Related

kubernetes : cert-manager/secret-for-certificate-mapper "msg"="unable to fetch certificate that owns the secret

Cert-manager/secret-for-certificate-mapper "msg"="unable to fetch certificate that owns the secret" "error"="Certificate.cert-manager.io "grafanaps-tls" not found"
So , from the investigation , I’m not able to find the grafanaps-tls
Kubectl get certificates
NAME READY SECRET AGE
Alertmanagerdf-tls False alertmanagerdf-tls 1y61d
Prometheusps-tls False prometheusps-tls 1y58
We have do this followings : The nginx ingress and cert-manager were outdated and not compatible with the Kubernetes version of 1.22 anymore. As a result, an upgrade of those components was initiated in order to restore pod operation.
The cmctl check api -n cert-manager command now returns: The cert-manager API has been upgraded to version 1.7 and orphaned secrets have been cleaned up
Cert-manager/webhook "msg"="Detected root CA rotation - regenerating serving certificates"
After a restart the logs looked mainly clean.
For my finding , the issue is integration of cert-manager with the Kubernetes ingress controlle .
So I was interest in cert-manager configuration mostly on ingressshim configuration and args section
It appears that the SSL certificate for several servers has expired and looks like the issue with the certificate resources or the integration of cert-manager with the Kubernetes ingress controller.
Config:
C:\Windows\system32>kubectl describe deployment cert-manager-cabictor -n cert-manager
Name: cert-manager-cabictor
Namespace: cert-manager
CreationTimestamp: Thu, 01 Dec 2022 18:31:02 +0530
Labels: app=cabictor
app.kubernetes.io/component=cabictor
app.kubernetes.io/instance=cert-manager
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=cabictor
app.kubernetes.io/version=v1.7.3
helm.sh/chart=cert-manager-v1.7.3
Annotations: deployment.kubernetes.io/revision: 2
meta.helm.sh/release-name: cert-manager
meta.helm.sh/release-namespace: cert-manager
Selector: app.kubernetes.io/component=cabictor ,app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=cabictor
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=cabictor
app.kubernetes.io/component=cabictor
app.kubernetes.io/instance=cert-manager
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=cabictor
app.kubernetes.io/version=v1.7.3
helm.sh/chart=cert-manager-v1.7.3
Service Account: cert-manager-cabictor
Containers:
cert-manager:
Image: quay.io/jetstack/cert-manager-cabictor :v1.7.3
Port: <none>
Host Port: <none>
Args:
--v=2
--leader-election-namespace=kube-system
Environment:
POD_NAMESPACE: (v1:metadata.namespace)
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: cert-manager-cabictor -5b65bcdbbd (1/1 replicas created)
Events: <none>
I was not able to identify and fix the root cause here ..
What is the problem here, and how can it be resolved? Any help would be greatly appreciated

why my Velero backup fail for minio storage with error An error occurred: gzip: invalid header?

I have install the minio example from velero vmware-tanzu. The minio example setup is running and I have set the nodePort service. Then I follow the following command to install velero.
velero install --provider aws --plugins velero/velero-plu gin-for-aws:v1.0.0 --bucket velero --secret-file ./credentials-velero --use-volume-snapshots=false --backup-location-config r egion=minio,s3ForcePathStyle="true",s3Url=http://123.123.123.123:30804
When I check the logs for velero I see this error.
time="2023-02-05T10:45:30Z" level=error msg="fail to validate backup store" backup-storage-location=velero/default controller =backup-storage-location error="rpc error: code = Unknown desc = InvalidArgument: S3 API Requests must be made to API port.\n \tstatus code: 400, request id: , host id: " error.file="/go/src/github.com/vmware-tanzu/velero/pkg/persistence/object_store. go:191" error.function="github.com/vmware-tanzu/velero/pkg/persistence.(*objectBackupStore).IsValid" logSource="pkg/controlle r/backup_storage_location_controller.go:154"
time="2023-02-05T10:45:30Z" level=info msg="BackupStorageLocation is invalid, marking as unavailable" backup-storage-location =velero/default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:130"
time="2023-02-05T10:45:30Z" level=error msg="Error listing backups in backup store" backupLocation=velero/default controller= backup-sync error="rpc error: code = Unknown desc = InvalidArgument: S3 API Requests must be made to API port.\n\tstatus code : 400, request id: , host id: " error.file="/go/src/github.com/vmware-tanzu/velero-plugin-for-aws/velero-plugin-for-aws/objec t_store.go:308" error.function="main.(*ObjectStore).ListCommonPrefixes" logSource="pkg/controller/backup_sync_controller.go:1 07"
time="2023-02-05T10:45:30Z" level=error msg="Current BackupStorageLocations available/unavailable/unknown: 0/0/1, BackupStora geLocation \"default\" is unavailable: rpc error: code = Unknown desc = InvalidArgument: S3 API Requests must be made to API port.\n\tstatus code: 400, request id: , host id: )" controller=backup-storage-location logSource="pkg/controller/backup_stor age_location_controller.go:191"
AS I can see velero didn't manage to find the backup location as I provided the correct URL.
When I run
kubectl describe deployment -n velero velero
I found this output.
Name: velero
Namespace: velero
CreationTimestamp: Sun, 05 Feb 2023 12:45:24 +0200
Labels: component=velero
Annotations: deployment.kubernetes.io/revision: 1
Selector: deploy=velero
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: component=velero
deploy=velero
Annotations: prometheus.io/path: /metrics
prometheus.io/port: 8085
prometheus.io/scrape: true
Service Account: velero
Init Containers:
velero-velero-plugin-for-aws:
Image: velero/velero-plugin-for-aws:v1.0.0
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/target from plugins (rw)
Containers:
velero:
Image: velero/velero:v1.10.1-rc.1
Port: 8085/TCP
Host Port: 0/TCP
Command:
/velero
Args:
server
--features=
--uploader-type=restic
Limits:
cpu: 1
memory: 512Mi
Requests:
cpu: 500m
memory: 128Mi
Environment:
VELERO_SCRATCH_DIR: /scratch
VELERO_NAMESPACE: (v1:metadata.namespace)
LD_LIBRARY_PATH: /plugins
GOOGLE_APPLICATION_CREDENTIALS: /credentials/cloud
AWS_SHARED_CREDENTIALS_FILE: /credentials/cloud
AZURE_CREDENTIALS_FILE: /credentials/cloud
ALIBABA_CLOUD_CREDENTIALS_FILE: /credentials/cloud
Mounts:
/credentials from cloud-credentials (rw)
/plugins from plugins (rw)
/scratch from scratch (rw)
Volumes:
plugins:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
scratch:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
cloud-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cloud-credentials
Optional: false
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: velero-86f4984c96 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 2m45s deployment-controller Scaled up replica set velero-86f4984c96 to 1
Here all the Variables are set correctly but the volume path is not set. I am not sure where is issue basically.
This might help that when I look for velero backup location it is not set.
master-k8s#masterk8s-virtual-machine:~/velero-v1.2.0-darwin-amd64$ velero backup-location get
NAME PROVIDER BUCKET/PREFIX PHASE LAST VALIDATED ACCESS MODE DEFAULT
default aws velero Unavailable 2023-02-05 12:52:30 +0200 EET ReadWrite true
What I have done so far
I have try different blogs and stack question to fix. The issue is not resolved.
What is the main cause?
The main cause of this issue is I am not able to understand the logs. I want to know why velero not able to find the location in my case where as in different blogs the same setup works.
What I want to do Or how can you help me?
Please help me to find the main issue. Why my backup fails with error ,
master-k8s#masterk8s-virtual-machine:~/velero-v1.2.0-darwin-amd64$ velero backup create mytest --inc lude-namespaces postgres-operator
Backup request "mytest" submitted successfully.
Run `velero backup describe mytest` or `velero backup logs mytest` for more details.
master-k8s#masterk8s-virtual-machine:~/velero-v1.2.0-darwin-amd64$ velero backup get NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION S ELECTOR
mytest Failed 0 0 2023-02-05 13:00:03 +0200 EET 29d default < none>
master-k8s#masterk8s-virtual-machine:~/velero-v1.2.0-darwin-amd64$ velero backup logs mytest
An error occurred: gzip: invalid header
I will be really thanks full for your help and support in advance.

Troubleshooting Kubernetes Tutorial

I am working through the coarse parallel processing Kubernetes tutorial located https://kubernetes.io/docs/tasks/job/coarse-parallel-processing-work-queue/#before-you-begin . I have my cluster set up with Rancher on AWS using EC2 instances. When I run
kubectl apply -f ./job.yaml
kubectl describe jobs/job-wq-1
I receive the following output
Name: job-wq-1
Namespace: default
Selector: controller-uid=5f9e1780-a1b9-11e9-a6b7-026525d9a49a
Labels: controller-uid=5f9e1780-a1b9-11e9-a6b7-026525d9a49a
job-name=job-wq-1
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"job-wq-1","namespace":"default"},"spec":{"completions":8,"paral...
Parallelism: 2
Completions: 8
Start Time: Mon, 08 Jul 2019 15:48:35 -0400
Pods Statuses: 0 Running / 0 Succeeded / 2 Failed
Pod Template:
Labels: controller-uid=5f9e1780-a1b9-11e9-a6b7-026525d9a49a
job-name=job-wq-1
Containers:
c:
Image: mgladden/job-wq-1
Port: <none>
Host Port: <none>
Environment:
BROKER_URL: amqp://guest:guest#rabbitmq-service:5672
QUEUE: job1
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 10m job-controller Created pod: job-wq-1-z8kn6
Normal SuccessfulCreate 10m job-controller Created pod: job-wq-1-lqcfs
Normal SuccessfulDelete 9m35s job-controller Deleted pod: job-wq-1-z8kn6
Normal SuccessfulDelete 9m35s job-controller Deleted pod: job-wq-1-lqcfs
I am unsure of how to troubleshoot at this point. It appears that none succeeded. Could it be due to my Rancher set up? I did notice in the tutorial that Annotations were blank and I have output from my work.
Thanks for the help. I checked the error logs and found the following error "logging in to AMQP server: a socket error occurred" Looks to be a problem with using the older 14.04 version of ubuntu when building the docker image. When I switched to 18.04 version of ubuntu the tutorial finished as expected.

Kubernetes Keeps Restarting Pods of StatefulSet in Minikube With "Need to kill pod"

Minikube version v0.24.1
kubernetes version 1.8.0
The problem that I am facing is that I have several statefulsets created in minikube each with one pod.
Sometimes when I start up minikube my pods will start up initially then keep being restarted by kubernetes. They will go from the creating container state, to running, to terminating over and over.
Now I've seen kubernetes kill and restart things before if kubernetes detects disk pressure, memory pressure, or some other condition like that, but that's not the case here as these flags are not raised and the only message in the pod's event log is "Need to kill pod".
What's most confusing is that this issue doesn't happen all the time, and I'm not sure how to trigger it. My minikube setup will work for a week or more without this happening then one day I'll start minikube up and the pods for my statefulsets just keep restarting. So far the only workaround I've found is to delete my minikube instance and set it up again from scratch, but obviously this is not ideal.
Seen here is a sample of one of the statefulsets whose pod keeps getting restarted. Seen in the logs kubernetes is deleting the pod and starting it again. This happens repeatedly. I'm unable to figure out why it keeps doing that and why it only gets into this state sometimes.
$ kubectl describe statefulsets mongo --namespace=storage
Name: mongo
Namespace: storage
CreationTimestamp: Mon, 08 Jan 2018 16:11:39 -0600
Selector: environment=test,role=mongo
Labels: name=mongo
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"apps/v1beta1","kind":"StatefulSet","metadata":{"annotations":{},"labels":{"name":"mongo"},"name":"mongo","namespace":"storage"},"...
Replicas: 1 desired | 1 total
Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: environment=test
role=mongo
Containers:
mongo:
Image: mongo:3.4.10-jessie
Port: 27017/TCP
Command:
mongod
--replSet
rs0
--smallfiles
--noprealloc
Environment: <none>
Mounts:
/data/db from mongo-persistent-storage (rw)
mongo-sidecar:
Image: cvallance/mongo-k8s-sidecar
Port: <none>
Environment:
MONGO_SIDECAR_POD_LABELS: role=mongo,environment=test
KUBERNETES_MONGO_SERVICE_NAME: mongo
Mounts: <none>
Volumes: <none>
Volume Claims:
Name: mongo-persistent-storage
StorageClass:
Labels: <none>
Annotations: volume.alpha.kubernetes.io/storage-class=default
Capacity: 5Gi
Access Modes: [ReadWriteOnce]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulDelete 23m (x46 over 1h) statefulset delete Pod mongo-0 in StatefulSet mongo successful
Normal SuccessfulCreate 3m (x62 over 1h) statefulset create Pod mongo-0 in StatefulSet mongo successful
After some more digging there seems to have been a bug which can affect statefulsets that creates multiple controllers for the same statefulset:
https://github.com/kubernetes/kubernetes/issues/56355
This issue seems to have been fixed and the fix seems to have been backported to version 1.8 of kubernetes and included in version 1.9, but minikube doesn't yet have the fixed version. A workaround if your system enters this state is to list the controller revisions like so:
$ kubectl get controllerrevisions --namespace=storage
NAME CONTROLLER REVISION AGE
mongo-68bd5cbcc6 StatefulSet/mongo 1 19h
mongo-68bd5cbcc7 StatefulSet/mongo 1 7d
and delete the duplicate controllers for each statefulset.
$ kubectl delete controllerrevisions mongo-68bd5cbcc6 --namespace=storage
or to simply use version 1.9 of kubernetes or above that includes this bug fix.

No nodes available to schedule pods, using google container engine

I'm having an issue where a container I'd like to run doesn't appear to be getting started on my cluster.
I've tried searching around for possible solutions, but there's a surprising lack of information out there to assist with this issue or anything of it's nature.
Here's the most I could gather:
$ kubectl describe pods/elasticsearch
Name: elasticsearch
Namespace: default
Image(s): my.image.host/my-project/elasticsearch
Node: /
Labels: <none>
Status: Pending
Reason:
Message:
IP:
Replication Controllers: <none>
Containers:
elasticsearch:
Image: my.image.host/my-project/elasticsearch
Limits:
cpu: 100m
State: Waiting
Ready: False
Restart Count: 0
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
Mon, 19 Oct 2015 10:28:44 -0500 Mon, 19 Oct 2015 10:34:09 -0500 12 {scheduler } failedScheduling no nodes available to schedule pods
I also see this:
$ kubectl get pod elasticsearch -o wide
NAME READY STATUS RESTARTS AGE NODE
elasticsearch 0/1 Pending 0 5s
I guess I'd like to know: What prerequisites exist so that I can be confident that my container is going to run in container engine? What do I need to do in this scenario to get it running?
Here's my yml file:
apiVersion: v1
kind: Pod
metadata:
name: elasticsearch
spec:
containers:
- name: elasticsearch
image: my.image.host/my-project/elasticsearch
ports:
- containerPort: 9200
resources:
volumeMounts:
- name: elasticsearch-data
mountPath: /usr/share/elasticsearch
volumes:
- name: elasticsearch-data
gcePersistentDisk:
pdName: elasticsearch-staging
fsType: ext4
Here's some more output about my node:
$ kubectl get nodes
NAME LABELS STATUS
gke-elasticsearch-staging-00000000-node-yma3 kubernetes.io/hostname=gke-elasticsearch-staging-00000000-node-yma3 NotReady
You only have one node in your cluster and its status in NotReady. So you won't be able to schedule any pods. You can try to determine why your node isn't ready by looking in /var/log/kubelet.log. You can also add new nodes to your cluster (scale the cluster size up to 2) or delete the node (it will be automatically replaced by the instance group manager) to see if either of those options get you a working node.
It appears that scheduler couldn't see any nodes in your cluster. You can run kubectl get nodes and gcloud compute instances list to confirm whether you have any nodes in the cluster. Did you correctly specify number of nodes (--num-nodes) when creating the cluster?