why my Velero backup fail for minio storage with error An error occurred: gzip: invalid header? - kubernetes

I have install the minio example from velero vmware-tanzu. The minio example setup is running and I have set the nodePort service. Then I follow the following command to install velero.
velero install --provider aws --plugins velero/velero-plu gin-for-aws:v1.0.0 --bucket velero --secret-file ./credentials-velero --use-volume-snapshots=false --backup-location-config r egion=minio,s3ForcePathStyle="true",s3Url=http://123.123.123.123:30804
When I check the logs for velero I see this error.
time="2023-02-05T10:45:30Z" level=error msg="fail to validate backup store" backup-storage-location=velero/default controller =backup-storage-location error="rpc error: code = Unknown desc = InvalidArgument: S3 API Requests must be made to API port.\n \tstatus code: 400, request id: , host id: " error.file="/go/src/github.com/vmware-tanzu/velero/pkg/persistence/object_store. go:191" error.function="github.com/vmware-tanzu/velero/pkg/persistence.(*objectBackupStore).IsValid" logSource="pkg/controlle r/backup_storage_location_controller.go:154"
time="2023-02-05T10:45:30Z" level=info msg="BackupStorageLocation is invalid, marking as unavailable" backup-storage-location =velero/default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:130"
time="2023-02-05T10:45:30Z" level=error msg="Error listing backups in backup store" backupLocation=velero/default controller= backup-sync error="rpc error: code = Unknown desc = InvalidArgument: S3 API Requests must be made to API port.\n\tstatus code : 400, request id: , host id: " error.file="/go/src/github.com/vmware-tanzu/velero-plugin-for-aws/velero-plugin-for-aws/objec t_store.go:308" error.function="main.(*ObjectStore).ListCommonPrefixes" logSource="pkg/controller/backup_sync_controller.go:1 07"
time="2023-02-05T10:45:30Z" level=error msg="Current BackupStorageLocations available/unavailable/unknown: 0/0/1, BackupStora geLocation \"default\" is unavailable: rpc error: code = Unknown desc = InvalidArgument: S3 API Requests must be made to API port.\n\tstatus code: 400, request id: , host id: )" controller=backup-storage-location logSource="pkg/controller/backup_stor age_location_controller.go:191"
AS I can see velero didn't manage to find the backup location as I provided the correct URL.
When I run
kubectl describe deployment -n velero velero
I found this output.
Name: velero
Namespace: velero
CreationTimestamp: Sun, 05 Feb 2023 12:45:24 +0200
Labels: component=velero
Annotations: deployment.kubernetes.io/revision: 1
Selector: deploy=velero
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: component=velero
deploy=velero
Annotations: prometheus.io/path: /metrics
prometheus.io/port: 8085
prometheus.io/scrape: true
Service Account: velero
Init Containers:
velero-velero-plugin-for-aws:
Image: velero/velero-plugin-for-aws:v1.0.0
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/target from plugins (rw)
Containers:
velero:
Image: velero/velero:v1.10.1-rc.1
Port: 8085/TCP
Host Port: 0/TCP
Command:
/velero
Args:
server
--features=
--uploader-type=restic
Limits:
cpu: 1
memory: 512Mi
Requests:
cpu: 500m
memory: 128Mi
Environment:
VELERO_SCRATCH_DIR: /scratch
VELERO_NAMESPACE: (v1:metadata.namespace)
LD_LIBRARY_PATH: /plugins
GOOGLE_APPLICATION_CREDENTIALS: /credentials/cloud
AWS_SHARED_CREDENTIALS_FILE: /credentials/cloud
AZURE_CREDENTIALS_FILE: /credentials/cloud
ALIBABA_CLOUD_CREDENTIALS_FILE: /credentials/cloud
Mounts:
/credentials from cloud-credentials (rw)
/plugins from plugins (rw)
/scratch from scratch (rw)
Volumes:
plugins:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
scratch:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
cloud-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cloud-credentials
Optional: false
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: velero-86f4984c96 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 2m45s deployment-controller Scaled up replica set velero-86f4984c96 to 1
Here all the Variables are set correctly but the volume path is not set. I am not sure where is issue basically.
This might help that when I look for velero backup location it is not set.
master-k8s#masterk8s-virtual-machine:~/velero-v1.2.0-darwin-amd64$ velero backup-location get
NAME PROVIDER BUCKET/PREFIX PHASE LAST VALIDATED ACCESS MODE DEFAULT
default aws velero Unavailable 2023-02-05 12:52:30 +0200 EET ReadWrite true
What I have done so far
I have try different blogs and stack question to fix. The issue is not resolved.
What is the main cause?
The main cause of this issue is I am not able to understand the logs. I want to know why velero not able to find the location in my case where as in different blogs the same setup works.
What I want to do Or how can you help me?
Please help me to find the main issue. Why my backup fails with error ,
master-k8s#masterk8s-virtual-machine:~/velero-v1.2.0-darwin-amd64$ velero backup create mytest --inc lude-namespaces postgres-operator
Backup request "mytest" submitted successfully.
Run `velero backup describe mytest` or `velero backup logs mytest` for more details.
master-k8s#masterk8s-virtual-machine:~/velero-v1.2.0-darwin-amd64$ velero backup get NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION S ELECTOR
mytest Failed 0 0 2023-02-05 13:00:03 +0200 EET 29d default < none>
master-k8s#masterk8s-virtual-machine:~/velero-v1.2.0-darwin-amd64$ velero backup logs mytest
An error occurred: gzip: invalid header
I will be really thanks full for your help and support in advance.

Related

Metrics server is currently unable to handle the request

I am new to kubernetes and was trying to apply horizontal pod autoscaling to my existing application. and after following other stackoverflow details - got to know that I need to install metric-server - and I was able to - but some how it's not working and unable to handle request.
Further I followed few more things but unable to resolve the issue - I will really appreciate any help here.
Please let me know for any further details you need for helping me :) Thanks in advance.
Steps followed:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
kubectl get deploy,svc -n kube-system | egrep metrics-server
deployment.apps/metrics-server 1/1 1 1 2m6s
service/metrics-server ClusterIP 10.32.0.32 <none> 443/TCP 2m6s
kubectl get pods -n kube-system | grep metrics-server
metrics-server-64cf6869bd-6gx88 1/1 Running 0 2m39s
vi ana_hpa.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: ana-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: common-services-auth
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 160
k apply -f ana_hpa.yaml
horizontalpodautoscaler.autoscaling/ana-hpa created
k get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
ana-hpa StatefulSet/common-services-auth <unknown>/160%, <unknown>/80% 1 10 0 4s
k describe hpa ana-hpa
Name: ana-hpa
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Tue, 12 Apr 2022 17:01:25 +0530
Reference: StatefulSet/common-services-auth
Metrics: ( current / target )
resource memory on pods (as a percentage of request): <unknown> / 160%
resource cpu on pods (as a percentage of request): <unknown> / 80%
Min replicas: 1
Max replicas: 10
StatefulSet pods: 3 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 38s (x8 over 2m23s) horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Warning FailedComputeMetricsReplicas 38s (x8 over 2m23s) horizontal-pod-autoscaler invalid metrics (2 invalid out of 2), first error is: failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Warning FailedGetResourceMetric 23s (x9 over 2m23s) horizontal-pod-autoscaler failed to get memory utilization: unable to get metrics for resource memory: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
kubectl get --raw /apis/metrics.k8s.io/v1beta1
Error from server (ServiceUnavailable): the server is currently unable to handle the request
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
Error from server (ServiceUnavailable): the server is currently unable to handle the request
kubectl edit deployments.apps -n kube-system metrics-server
Add hostNetwork: true
deployment.apps/metrics-server edited
kubectl get pods -n kube-system | grep metrics-server
metrics-server-5dc6dbdb8-42hw9 1/1 Running 0 10m
k describe pod metrics-server-5dc6dbdb8-42hw9 -n kube-system
Name: metrics-server-5dc6dbdb8-42hw9
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: pusntyn196.apac.avaya.com/10.133.85.196
Start Time: Tue, 12 Apr 2022 17:08:25 +0530
Labels: k8s-app=metrics-server
pod-template-hash=5dc6dbdb8
Annotations: <none>
Status: Running
IP: 10.133.85.196
IPs:
IP: 10.133.85.196
Controlled By: ReplicaSet/metrics-server-5dc6dbdb8
Containers:
metrics-server:
Container ID: containerd://024afb1998dce4c0bd5f4e58f996068ea37982bd501b54fda2ef8d5c1098b4f4
Image: k8s.gcr.io/metrics-server/metrics-server:v0.6.1
Image ID: k8s.gcr.io/metrics-server/metrics-server#sha256:5ddc6458eb95f5c70bd13fdab90cbd7d6ad1066e5b528ad1dcb28b76c5fb2f00
Port: 4443/TCP
Host Port: 4443/TCP
Args:
--cert-dir=/tmp
--secure-port=4443
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--kubelet-use-node-status-port
--metric-resolution=15s
State: Running
Started: Tue, 12 Apr 2022 17:08:26 +0530
Ready: True
Restart Count: 0
Requests:
cpu: 100m
memory: 200Mi
Liveness: http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:https/readyz delay=20s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g6p4g (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-g6p4g:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 2s
node.kubernetes.io/unreachable:NoExecute op=Exists for 2s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m31s default-scheduler Successfully assigned kube-system/metrics-server-5dc6dbdb8-42hw9 to pusntyn196.apac.avaya.com
Normal Pulled 2m32s kubelet Container image "k8s.gcr.io/metrics-server/metrics-server:v0.6.1" already present on machine
Normal Created 2m31s kubelet Created container metrics-server
Normal Started 2m31s kubelet Started container metrics-server
kubectl get --raw /apis/metrics.k8s.io/v1beta1
Error from server (ServiceUnavailable): the server is currently unable to handle the request
kubectl get pods -n kube-system | grep metrics-server
metrics-server-5dc6dbdb8-42hw9 1/1 Running 0 10m
kubectl logs -f metrics-server-5dc6dbdb8-42hw9 -n kube-system
E0412 11:43:54.684784 1 configmap_cafile_content.go:242] kube-system/extension-apiserver-authentication failed with : missing content for CA bundle "client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
E0412 11:44:27.001010 1 configmap_cafile_content.go:242] key failed with : missing content for CA bundle "client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
k logs -f metrics-server-5dc6dbdb8-42hw9 -n kube-system
I0412 11:38:26.447305 1 serving.go:342] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0412 11:38:26.899459 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0412 11:38:26.899477 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0412 11:38:26.899518 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0412 11:38:26.899545 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0412 11:38:26.899546 1 configmap_cafile_content.go:201] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0412 11:38:26.899567 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0412 11:38:26.900480 1 dynamic_serving_content.go:131] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"
I0412 11:38:26.900811 1 secure_serving.go:266] Serving securely on [::]:4443
I0412 11:38:26.900854 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
W0412 11:38:26.900965 1 shared_informer.go:372] The sharedIndexInformer has started, run more than once is not allowed
I0412 11:38:26.999960 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0412 11:38:26.999989 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I0412 11:38:26.999970 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
E0412 11:38:27.000087 1 configmap_cafile_content.go:242] kube-system/extension-apiserver-authentication failed with : missing content for CA bundle "client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
E0412 11:38:27.000118 1 configmap_cafile_content.go:242] key failed with : missing content for CA bundle "client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
kubectl top pods
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
Edit metrics server deployment yaml
Add - --kubelet-insecure-tls
k apply -f metric-server-deployment.yaml
serviceaccount/metrics-server unchanged
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
clusterrole.rbac.authorization.k8s.io/system:metrics-server unchanged
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader unchanged
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator unchanged
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server unchanged
service/metrics-server unchanged
deployment.apps/metrics-server configured
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io unchanged
kubectl get pods -n kube-system | grep metrics-server
metrics-server-5dc6dbdb8-42hw9 1/1 Running 0 10m
kubectl top pods
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
Also tried by adding below to metrics server deployment
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
This can easily be resolved by editing the deployment yaml files and adding the hostNetwork: true after the dnsPolicy: ClusterFirst
kubectl edit deployments.apps -n kube-system metrics-server
insert:
hostNetwork: true
I hope this help somebody for bare metal cluster:
$ helm --repo https://kubernetes-sigs.github.io/metrics-server/ --kubeconfig=$HOME/.kube/loc-cluster.config -n kube-system --set args='{--kubelet-insecure-tls}' upgrade --install metrics-server metrics-server
$ helm --kubeconfig=$HOME/.kube/loc-cluster.config -n kube-system uninstall metrics-server
Update: I deployed the metrics-server using the same command. Perhaps you can start fresh by removing existing resources and running:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
=======================================================================
It appears the --kubelet-insecure-tls flag was not configured correctly for the pod template in the deployment. The following should fix this:
Edit the existing deployment in the cluster with kubectl edit deployment/metrics-server -nkube-system.
Add the flag to the spec.containers[].args list, so that the deployment looks like this:
...
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls <=======ADD IT HERE.
image: k8s.gcr.io/metrics-server/metrics-server:v0.6.1
...
Simply save your changes and let the deployment rollout the updated pods. You can use watch -n1 kubectl get deployment/kube-metrics -nkube-system and wait for UP-TO-DATE column to show 1.
Like this:
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1 1 16m
Verify with kubectl top nodes. It will show something like
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
docker-desktop 222m 5% 1600Mi 41%
I've just verified this to work on a local setup. Let me know if this helps :)
Please configuration aggregation layer correctly and carefully, you can use this link for help : https://kubernetes.io/docs/tasks/extend-kubernetes/configure-aggregation-layer/.
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: <name of the registration object>
spec:
group: <API group name this extension apiserver hosts>
version: <API version this extension apiserver hosts>
groupPriorityMinimum: <priority this APIService for this group, see API documentation>
versionPriority: <prioritizes ordering of this version within a group, see API documentation>
service:
namespace: <namespace of the extension apiserver service>
name: <name of the extension apiserver service>
caBundle: <pem encoded ca cert that signs the server cert used by the webhook>
It would be helpful to provide kubectl version return value.
For me on EKS with helmfile I had to write in the values.yaml using the metrics-server chart :
containerPort: 10250
The value was enforced by default to 4443 for an unknown reason when I first deployed the chart.
See doc:
https://github.com/kubernetes-sigs/metrics-server/blob/master/charts/metrics-server/values.yaml#L62
https://aws.amazon.com/premiumsupport/knowledge-center/eks-metrics-server/#:~:text=confirm%20that%20your%20security%20groups
Then kubectl top nodes and kubectl describe apiservice v1beta1.metrics.k8s.io were working.
First of all, execute the following command:
kubectl get apiservices
And checkout the availablity (status) of kube-system/metrics-server service.
In case the availability is True:
Add hostNetwork: true to the spec of your metrics-server deployment by executing the following command:
kubectl edit deployment -n kube-system metrics-server
It should look like the following:
...
spec:
hostNetwork: true
...
Setting hostNetwork to true means that Pod will have access to
the host where it's running.
In case the availability is False (MissingEndpoints):
Download metrics-server:
wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.5.0/components.yaml
Remove (legacy) metrics server:
kubectl delete -f components.yaml
Edit downloaded file and add - --kubelet-insecure-tls to args list:
...
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls # add this line
...
Create service once again:
kubectl apply -f components.yaml

Kubernetes master doesn't attach FlexVolume

I'm trying to attach the dummy-attachable FlexVolume sample for Kubernetes which seems to initialize normally according to my logs on both the nodes and master:
Loaded volume plugin "flexvolume-k8s/dummy-attachable
But when I try to attach the volume to a pod, the attach method never gets called from the master. The logs from the node read:
flexVolume driver k8s/dummy-attachable: using default GetVolumeName for volume dummy-attachable
operationExecutor.VerifyControllerAttachedVolume started for volume "dummy-attachable"
Operation for "\"flexvolume-k8s/dummy-attachable/dummy-attachable\"" failed. No retries permitted until 2019-04-22 13:42:51.21390334 +0000 UTC m=+4814.674525788 (durationBeforeRetry 500ms). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"dummy-attachable\" (UniqueName: \"flexvolume-k8s/dummy-attachable/dummy-attachable\") pod \"nginx-dummy-attachable\"
Here's how I'm attempting to mount the volume:
apiVersion: v1
kind: Pod
metadata:
name: nginx-dummy-attachable
namespace: default
spec:
containers:
- name: nginx-dummy-attachable
image: nginx
volumeMounts:
- name: dummy-attachable
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: dummy-attachable
flexVolume:
driver: "k8s/dummy-attachable"
Here is the ouput of kubectl describe pod nginx-dummy-attachable:
Name: nginx-dummy-attachable
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: [node id]
Start Time: Wed, 24 Apr 2019 08:03:21 -0400
Labels: <none>
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container nginx-dummy-attachable
Status: Pending
IP:
Containers:
nginx-dummy-attachable:
Container ID:
Image: nginx
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/data from dummy-attachable (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hcnhj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
dummy-attachable:
Type: FlexVolume (a generic volume resource that is provisioned/attached using an exec based plugin)
Driver: k8s/dummy-attachable
FSType:
SecretRef: nil
ReadOnly: false
Options: map[]
default-token-hcnhj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hcnhj
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 41s (x6 over 11m) kubelet, [node id] Unable to mount volumes for pod "nginx-dummy-attachable_default([id])": timeout expired waiting for volumes to attach or mount for pod "default"/"nginx-dummy-attachable". list of unmounted volumes=[dummy-attachable]. list of unattached volumes=[dummy-attachable default-token-hcnhj]
I added debug logging to the FlexVolume, so I was able to verify that the attach method was never called on the master node. I'm not sure what I'm missing here.
I don't know if this matters, but the cluster is being launched with KOPS. I've tried with both k8s 1.11 and 1.14 with no success.
So this is a fun one.
Even though kubelet initializes the FlexVolume plugin on master, kube-controller-manager, which is containerized in KOPs, is the application that's actually responsible for attaching the volume to the pod. KOPs doesn't mount the default plugin directory /usr/libexec/kubernetes/kubelet-plugins/volume/exec into the kube-controller-manager pod, so it doesn't know anything about your FlexVolume plugins on master.
There doesn't appear to be a non-hacky way to do this other than to use a different Kubernetes deployment tool until KOPs addresses this problem.

How to check when "kubectl delete" failed with "timeout waiting for ... to be synced"

I have a Kubernetes v1.10.2 cluster and a cronjob on it.
The job config is set to:
failedJobsHistoryLimit: 1
successfulJobsHistoryLimit: 3
But it has created more than ten jobs, which are all successful and not removed automatically.
Now I am trying to delete them manually, with kubectl delete job XXX, but the command timeout as:
$ kubectl delete job XXX
error: timed out waiting for "XXX" to be synced
I want to know how can I check in such a situation. Is there a log file for the command execution?
I only know the kubectl logs command, but it is not for such a situation.
"kubectl get" shows the job has already finished:
status:
active: 1
completionTime: 2018-08-27T21:20:21Z
conditions:
- lastProbeTime: 2018-08-27T21:20:21Z
lastTransitionTime: 2018-08-27T21:20:21Z
status: "True"
type: Complete
failed: 3
startTime: 2018-08-27T01:00:00Z
succeeded: 1
and "kubectl describe" output as:
$ kubectl describe job test-elk-xxx-1535331600 -ntest
Name: test-elk-xxx-1535331600
Namespace: test
Selector: controller-uid=863a14e3-a994-11e8-8bd7-fa163e23632f
Labels: controller-uid=863a14e3-a994-11e8-8bd7-fa163e23632f
job-name=test-elk-xxx-1535331600
Annotations: <none>
Controlled By: CronJob/test-elk-xxx
Parallelism: 0
Completions: 1
Start Time: Mon, 27 Aug 2018 01:00:00 +0000
Pods Statuses: 1 Running / 1 Succeeded / 3 Failed
Pod Template:
Labels: controller-uid=863a14e3-a994-11e8-8bd7-fa163e23632f
job-name=test-elk-xxx-1535331600
Containers:
xxx:
Image: test-elk-xxx:18.03-3
Port: <none>
Host Port: <none>
Args:
--config
/etc/elasticsearch-xxx/xxx.yml
/etc/elasticsearch-xxx/actions.yml
Limits:
cpu: 100m
memory: 100Mi
Requests:
cpu: 100m
memory: 100Mi
Environment: <none>
Mounts:
/etc/elasticsearch-xxx from xxx-configs (ro)
Volumes:
xxx-configs:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: test-elk-xxx
Optional: false
Events: <none>
It indicates still one pod running, but I don't know how to figure out the pod name.
Check if kubectl describe pod <pod name> (associated pod of the job) still returns something, which would:
mean the node is still there
include the pod condition
In that state, you can then consider a force deletion.
I think this is the same as the problem reported in github:
Cannot delete jobs when their associated pods are gone
This is reported by several people, and it is not fixed still.
And can use the "-v=X" (e.g. -v=8) option for the kubectl command, it will give more detailed debug info.
As taken from https://github.com/kubernetes/kubernetes/issues/43168#issuecomment-375700293
Try using --cascade=false in your delete job command.
It worked for me

Kubernetes minikube - can pull from docker registry manually, but rolling deployments won't pull

I have a Kubernetes minikube running a deployment / service.
When I try to update the image to a new version (from my registry on a separate machine) as follows:
kubectl set image deployment/flask-deployment-yaml flask-api-
endpoint=192.168.1.201:5000/test_flask:2
It fails with the errors:
Failed to pull image "192.168.1.201:5000/test_flask:2": rpc error:
code = 2 desc = Error: image test_flask:2 not found
If I log on to my minikube server and manually pull the docker image as follows:
$ docker pull 192.168.1.201:5000/test_flask:2
2: Pulling from test_flask
280aca6ddce2: Already exists
3c0df3e97827: Already exists
669c8479e3f7: Pull complete
83323a067779: Pull complete
Digest: sha256:0f9650465284215d48ad0efe06dc888c50928b923ecc982a1b3d6fa38d
Status: Downloaded newer image for 192.168.1.201:5000/test_flask:2
It works, and then my deployment update suddently succeeds, presumably because the image now exists locally.
I'm not sure why the deployment update doesn't just work straight away...
More deployment details:
Name: flask-deployment-yaml
Namespace: default
CreationTimestamp: Sat, 07 Oct 2017 15:57:24 +0100
Labels: app=front-end
Annotations: deployment.kubernetes.io/revision=2
Selector: app=front-end
Replicas: 4 desired | 4 updated | 4 total | 4 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: app=front-end
Containers:
flask-api-endpoint:
Image: 192.168.1.201:5000/test_flask:2
Port: 5000/TCP
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: flask-deployment-yaml-1174202895 (4/4 replicas created)
You should either delete your minikube cluster and start it again with the --insecure-registry flag, to allow pulling from insecure registries, or use one that is reachable through localhost and port forward into the minikube cluster, as it won't refuse to pull from localhost. More details here:
- https://github.com/kubernetes/minikube/blob/master/docs/insecure_registry.md
- https://github.com/kubernetes/minikube/issues/604
And more details and illustrations on the problem and how to fix here: https://blog.hasura.io/sharing-a-local-registry-for-minikube-37c7240d0615

Kubernetes NFS volume mount fail with exit status 32

I have a Kubernetes setup installed in my Ubuntu machine. I'm trying to setup a nfs volume and mount it to a container according to this http://kubernetes.io/v1.1/examples/nfs/ document.
nfs service and pod configurations
kind: Service
apiVersion: v1
metadata:
name: nfs-server
spec:
ports:
- port: 2049
selector:
role: nfs-server
---
apiVersion: v1
kind: Pod
metadata:
name: nfs-server
labels:
role: nfs-server
spec:
containers:
- name: nfs-server
image: jsafrane/nfs-data
ports:
- name: nfs
containerPort: 2049
securityContext:
privileged: true
pod configuration to mount nfs volume
apiVersion: v1
kind: Pod
metadata:
name: nfs-web
spec:
containers:
- name: web
image: nginx
ports:
- name: web
containerPort: 80
volumeMounts:
# name must match the volume name below
- name: nfs
mountPath: "/usr/share/nginx/html"
volumes:
- name: nfs
nfs:
# FIXME: use the right hostname
server: 192.168.3.201
path: "/"
When I run kubectl describe nfs-web I get following output mentioning it was unable to mount nfs volume. What could be the reason for that?
Name: nfs-web
Namespace: default
Image(s): nginx
Node: 192.168.1.114/192.168.1.114
Start Time: Sun, 06 Dec 2015 08:31:06 +0530
Labels: <none>
Status: Pending
Reason:
Message:
IP:
Replication Controllers: <none>
Containers:
web:
Container ID:
Image: nginx
Image ID:
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Ready False
Volumes:
nfs:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 192.168.3.201
Path: /
ReadOnly: false
default-token-nh698:
Type: Secret (a secret that should populate this volume)
SecretName: default-token-nh698
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
36s 36s 1 {scheduler } Scheduled Successfully assigned nfs-web to 192.168.1.114
36s 2s 5 {kubelet 192.168.1.114} FailedMount Unable to mount volumes for pod "nfs-web_default": exit status 32
36s 2s 5 {kubelet 192.168.1.114} FailedSync Error syncing pod, skipping: exit status 32
I had the same problem, and I solved it by installing nfs-common in every Kubernetes nodes.
apt-get install -y nfs-common
My nodes were installed without nfs-common. Kubernetes will ask each node to mount the NFS into a specific directory to be available to the pod. As mount.nfs was not found, the mounting process failed.
Good luck!
It looks like volumes.nfs.server=192.168.3.201 is incorrectly configured on your client. It should be set to the ClusterIP address of your nfs-server Service.
Had the same issue with NFS which only allowed root mounts.
fixed by:
a. allow non-root users to mount NFS (on the server).
or
b. in PersistentVolume add
mountOptions:
- nfsvers=4.1
I fixed this issue by installing nfs-utils on the worker nodes.
In my case the issue was that i hadn't declared the host server of the nfs in the /etc/exports file. After adding an entry in there for my host server, the volume was working correctly.
if you modify the file in anyway then you need restart the service too;
sudo systemctl restart nfs-kernel-server
An example of an entry in the /etc/exports file;
/var/nfs/home 192.111.222.333(rw,sync,no_subtree_check)
In my case, the issue was the folder defined in volume hostPath was not created in the local. Once the folder was created in the worker node server, the issue was addressed.
Warning FailedMount 3m18s kubelet Unable to attach or mount volumes: unmounted volumes=[temp-volume], unattached volumes=[nfsvol-vre-data temp1-volume consumer1-serviceaccount-token-sdfsdf nfsvol]: timed out waiting for the condition
Warning FailedMount 71s (x10 over 5m20s) kubelet MountVolume.SetUp failed for volume "temp-volume" : hostPath type check failed: /tmp/folder is not a directory
Warning FailedMount 63s kubelet Unable to attach or mount volumes: unmounted volumes=[temp-volume], unattached volumes=[nfsvol nfsvol-vre-data temp1-volume consumer1-serviceaccount-token-sdfsdf]: timed out waiting for the condition
You need to execute the following on each master and node
sudo yum install nfs-utils -y