deploy exceptionless on k8s! Error Back-off restarting failed container - kubernetes

I get the exceptionless helm chart ,my value.yaml is https://github.com/mypublicuse/myfile/blob/main/el-values.yaml
i got errors
1:
Error: INSTALLATION FAILED: Deployment.apps "exceptionless-elasticsearch" is invalid: spec.template.spec.initContainers[0].image: Required value
so I edit the elasticsearch.yaml Add
spec:
initContainers:
name: sysctl
image: mydockerhost/busybox:1.35
so the helm can install
2: after helm install
i found
exless-nfsclient-nfs-subdir-external-provisioner-7fc86846fmlbgz 1/1 Running 0 52m
exceptionless-redis-85956947f-7vkpg 1/1 Running 0 49m
exceptionless-app-6547d4d88d-2hkbg 1/1 Running 0 49m
exceptionless-elasticsearch-76f6cc9b9-2jgks 1/1 Running 0 49m
exceptionless-jobs-web-hooks-7bb9d7477c-kpmwv 0/1 CrashLoopBackOff 14 (2m53s ago) 49m
exceptionless-jobs-event-notifications-844cb87665-bd7bt 0/1 CrashLoopBackOff 14 (2m53s ago) 49m
exceptionless-jobs-mail-message-647d6bd897-s8jmq 0/1 CrashLoopBackOff 14 (2m55s ago) 49m
exceptionless-jobs-event-usage-75c6d6d54d-m5rjr 0/1 CrashLoopBackOff 14 (2m46s ago) 49m
exceptionless-jobs-work-item-c74d77b55-th4g7 0/1 CrashLoopBackOff 14 (2m34s ago) 49m
exceptionless-jobs-daily-summary-6c99dfbc87-7zq5k 0/1 CrashLoopBackOff 14 (2m34s ago) 49m
exceptionless-jobs-event-posts-75777759b8-nsmbw 0/1 CrashLoopBackOff 14 (2m32s ago) 49m
exceptionless-jobs-close-inactive-sessions-b49595f49-hmfxm 0/1 CrashLoopBackOff 14 (2m14s ago) 49m
exceptionless-jobs-event-user-descriptions-5c9d5dc768-8h27z 0/1 CrashLoopBackOff 14 (2m16s ago) 49m
exceptionless-jobs-stack-event-count-54ffcfb4b6-gk6mz 0/1 CrashLoopBackOff 14 (2m ago) 49m
exceptionless-jobs-maintain-indexes-27669970-s28cg 0/1 CrashLoopBackOff 5 (94s ago) 4m30s
exceptionless-collector-5c774fd8ff-6ksvx 0/1 CrashLoopBackOff 2 (11s ago) 37s
exceptionless-api-66fc9cc659-zckzz 0/1 CrashLoopBackOff 3 (9s ago) 55s
api collector and jobs is un success!
I need help!thanks!
The pod log is
Back-off restarting failed container
yes just it!
i guess the program should be run and immediate crash ,so ....

Related

MK_ADDON_ENABLE : run callbacks: running callbacks: waiting for app.kubernetes.io/name=ingress-nginx pods: timed out waiting for the condition

I'm trying to enable ingress addon by typing:
minikube addons enable ingress
I get this Error:
`X Fermeture en raison de MK_ADDON_ENABLE : run callbacks: running callbacks: [waiting for app.kubernetes.io/name=ingress-nginx pods: timed out waiting for the condition]`
I did try to see kubectl pods and it shows:
`NAMESPACE NAME READY STATUS RESTARTS AGE
ingress-nginx ingress-nginx-admission-create-6xqc7 0/1 Completed 0 105m
ingress-nginx ingress-nginx-admission-patch-5qxwp 0/1 Completed 1 105m
ingress-nginx ingress-nginx-controller-5959f988fd-wngnn 0/1 ImageInspectError 0 105m
kube-system coredns-565d847f94-kdcf6 1/1 Running 1 (23m ago) 107m
kube-system etcd-minikube 1/1 Running 1 (23m ago) 107m
kube-system kube-apiserver-minikube 1/1 Running 1 (23m ago) 107m
kube-system kube-controller-manager-minikube 1/1 Running 1 (23m ago) 107m
kube-system kube-proxy-zzrwv 1/1 Running 1 (23m ago) 107m
kube-system kube-scheduler-minikube 1/1 Running 1 (23m ago) 107m
kube-system storage-provisioner 1/1 Running 3 (20m ago) 107m`

Kubernetes CrashLoopBackOff default timing

What are the defaults for the Kubernetes CrashLoopBackOff?
Say, I have a pod:
kubectl run mynginx --image nginx -- echo hello
And I inspect its status:
kubectl get pods -w
NAME READY STATUS RESTARTS AGE
mynginx 0/1 Pending 0 0s
mynginx 0/1 Pending 0 0s
mynginx 0/1 ContainerCreating 0 0s
mynginx 0/1 Completed 0 2s
mynginx 0/1 Completed 1 4s
mynginx 0/1 CrashLoopBackOff 1 5s
mynginx 0/1 Completed 2 20s
mynginx 0/1 CrashLoopBackOff 2 33s
mynginx 0/1 Completed 3 47s
mynginx 0/1 CrashLoopBackOff 3 59s
mynginx 0/1 Completed 4 97s
mynginx 0/1 CrashLoopBackOff 4 109s
This is "expected". Kubernetes starts a pod, it quits "too fast", Kubernetes schedules it again and then Kubernetes sets the state to CrashLoopBackOff.
Now, if i start a pod slightly differently:
kubectl run mynginx3 --image nginx -- /bin/bash -c "sleep 10; echo hello"
I get the following
kubectl get pods -w
NAME READY STATUS RESTARTS AGE
mynginx3 0/1 Pending 0 0s
mynginx3 0/1 Pending 0 0s
mynginx3 0/1 ContainerCreating 0 0s
mynginx3 1/1 Running 0 2s
mynginx3 0/1 Completed 0 12s
mynginx3 1/1 Running 1 14s
mynginx3 0/1 Completed 1 24s
mynginx3 0/1 CrashLoopBackOff 1 36s
mynginx3 1/1 Running 2 38s
mynginx3 0/1 Completed 2 48s
mynginx3 0/1 CrashLoopBackOff 2 62s
mynginx3 1/1 Running 3 75s
mynginx3 0/1 Completed 3 85s
mynginx3 0/1 CrashLoopBackOff 3 96s
mynginx3 1/1 Running 4 2m14s
mynginx3 0/1 Completed 4 2m24s
mynginx3 0/1 CrashLoopBackOff 4 2m38s
This is also expected.
But say I set sleep for 24 hours, would I still get the same CrashLoopBackOff after two pod exits initially and then after each next pod exit?
Based on these docs:
The restartPolicy applies to all containers in the Pod. restartPolicy only refers to restarts of the containers by the kubelet on the same node. After containers in a Pod exit, the kubelet restarts them with an exponential back-off delay (10s, 20s, 40s, …), that is capped at five minutes. Once a container has executed for 10 minutes without any problems, the kubelet resets the restart backoff timer for that container.
I think that means that anything that executes for longer than 10 minutes before exiting will not trigger a CrashLoopBackOff status.

Failed to open topo server on vitess with etcd

I'm running a simple example with Helm. Take a look below at values.yaml file:
cat << EOF | helm install helm/vitess -n vitess -f -
topology:
cells:
- name: 'zone1'
keyspaces:
- name: 'vitess'
shards:
- name: '0'
tablets:
- type: 'replica'
vttablet:
replicas: 1
mysqlProtocol:
enabled: true
authType: secret
username: vitess
passwordSecret: vitess-db-password
etcd:
replicas: 3
vtctld:
replicas: 1
vtgate:
replicas: 3
vttablet:
dataVolumeClaimSpec:
storageClassName: nfs-slow
EOF
Take a look at the output of current pods running below:
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-fb8b8dccf-8f5kt 1/1 Running 0 32m
kube-system coredns-fb8b8dccf-qbd6c 1/1 Running 0 32m
kube-system etcd-master1 1/1 Running 0 32m
kube-system kube-apiserver-master1 1/1 Running 0 31m
kube-system kube-controller-manager-master1 1/1 Running 0 32m
kube-system kube-flannel-ds-amd64-bkg9z 1/1 Running 0 32m
kube-system kube-flannel-ds-amd64-q8vh4 1/1 Running 0 32m
kube-system kube-flannel-ds-amd64-vqmnz 1/1 Running 0 32m
kube-system kube-proxy-bd8mf 1/1 Running 0 32m
kube-system kube-proxy-nlc2b 1/1 Running 0 32m
kube-system kube-proxy-x7cd5 1/1 Running 0 32m
kube-system kube-scheduler-master1 1/1 Running 0 32m
kube-system tiller-deploy-8458f6c667-cx2mv 1/1 Running 0 27m
vitess etcd-global-6pwvnv29th 0/1 Init:0/1 0 16m
vitess etcd-operator-84db9bc774-j4wml 1/1 Running 0 30m
vitess etcd-zone1-zwgvd7spzc 0/1 Init:0/1 0 16m
vitess vtctld-86cd78b6f5-zgfqg 0/1 CrashLoopBackOff 7 16m
vitess vtgate-zone1-58744956c4-x8ms2 0/1 CrashLoopBackOff 7 16m
vitess zone1-vitess-0-init-shard-master-mbbph 1/1 Running 0 16m
vitess zone1-vitess-0-replica-0 0/6 Init:CrashLoopBackOff 7 16m
Running logs I see this error:
$ kubectl logs -n vitess vtctld-86cd78b6f5-zgfqg
++ cat
+ eval exec /vt/bin/vtctld '-cell="zone1"' '-web_dir="/vt/web/vtctld"' '-web_dir2="/vt/web/vtctld2/app"' -workflow_manager_init -workflow_manager_use_election -logtostderr=true -stderrthreshold=0 -port=15000 -grpc_port=15999 '-service_map="grpc-vtctl"' '-topo_implementation="etcd2"' '-topo_global_server_address="etcd-global-client.vitess:2379"' -topo_global_root=/vitess/global
++ exec /vt/bin/vtctld -cell=zone1 -web_dir=/vt/web/vtctld -web_dir2=/vt/web/vtctld2/app -workflow_manager_init -workflow_manager_use_election -logtostderr=true -stderrthreshold=0 -port=15000 -grpc_port=15999 -service_map=grpc-vtctl -topo_implementation=etcd2 -topo_global_server_address=etcd-global-client.vitess:2379 -topo_global_root=/vitess/global
ERROR: logging before flag.Parse: E0422 02:35:34.020928 1 syslogger.go:122] can't connect to syslog
F0422 02:35:39.025400 1 server.go:221] Failed to open topo server (etcd2,etcd-global-client.vitess:2379,/vitess/global): grpc: timed out when dialing
I'm running behind vagrant with 1 master and 2 nodes. I suspect that is a issue with eth1.
The storage are configured to use NFS.
$ kubectl logs etcd-operator-84db9bc774-j4wml
time="2019-04-22T17:26:51Z" level=info msg="skip reconciliation: running ([]), pending ([etcd-zone1-zwgvd7spzc])" cluster-name=etcd-zone1 cluster-namespace=vitess pkg=cluster
time="2019-04-22T17:26:51Z" level=info msg="skip reconciliation: running ([]), pending ([etcd-zone1-zwgvd7spzc])" cluster-name=etcd-global cluster-namespace=vitess pkg=cluster
It appears that etcd is not fully initializing. Note that neither the pod for the global lockserver (etcd-global-6pwvnv29th) nor the local one for cell zone1 (pod etcd-zone1-zwgvd7spzc) are ready.

Volume is already attached by pod

I install kubernetes on ubuntu on baremetal. I deploy 1 master and 3 worker.
and then deploy rook and every thing work fine.but when i want to deploy a wordpress on it ,it stuck in container creating and then i delete wordpress and now i got this error
Volume is already attached by pod
default/wordpress-mysql-b78774f44-gvr58. Status Running
#kubectl describe pods wordpress-mysql-b78774f44-bjc2c
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m21s default-scheduler Successfully assigned default/wordpress-mysql-b78774f44-bjc2c to worker2
Warning FailedMount 2m57s (x6 over 3m16s) kubelet, worker2 MountVolume.SetUp failed for volume "pvc-dcba7817-553b-11e9-a229-52540076d16c" : mount command failed, status: Failure, reason: Rook: Mount volume failed: failed to attach volume pvc-dcba7817-553b-11e9-a229-52540076d16c for pod default/wordpress-mysql-b78774f44-bjc2c. Volume is already attached by pod default/wordpress-mysql-b78774f44-gvr58. Status Running
Normal Pulling 2m26s kubelet, worker2 Pulling image "mysql:5.6"
Normal Pulled 110s kubelet, worker2 Successfully pulled image "mysql:5.6"
Normal Created 106s kubelet, worker2 Created container mysql
Normal Started 101s kubelet, worker2 Started container mysql
for more information
# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-dcba7817-553b-11e9-a229-52540076d16c 20Gi RWO Delete Bound default/mysql-pv-claim rook-ceph-block 13m
pvc-e9797517-553b-11e9-a229-52540076d16c 20Gi RWO Delete Bound default/wp-pv-claim rook-ceph-block 13m
#kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
mysql-pv-claim Bound pvc-dcba7817-553b-11e9-a229-52540076d16c 20Gi RWO rook-ceph-block 15m
wp-pv-claim Bound pvc-e9797517-553b-11e9-a229-52540076d16c 20Gi RWO rook-ceph-block 14m
#kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default wordpress-595685cc49-sdbfk 1/1 Running 6 9m58s
default wordpress-mysql-b78774f44-bjc2c 1/1 Running 0 8m14s
kube-system coredns-fb8b8dccf-plnt4 1/1 Running 0 46m
kube-system coredns-fb8b8dccf-xrkql 1/1 Running 0 47m
kube-system etcd-master 1/1 Running 0 46m
kube-system kube-apiserver-master 1/1 Running 0 46m
kube-system kube-controller-manager-master 1/1 Running 1 46m
kube-system kube-flannel-ds-amd64-45bsf 1/1 Running 0 40m
kube-system kube-flannel-ds-amd64-5nxfz 1/1 Running 0 40m
kube-system kube-flannel-ds-amd64-pnln9 1/1 Running 0 40m
kube-system kube-flannel-ds-amd64-sg4pv 1/1 Running 0 40m
kube-system kube-proxy-2xsrn 1/1 Running 0 47m
kube-system kube-proxy-mll8b 1/1 Running 0 42m
kube-system kube-proxy-mv5dw 1/1 Running 0 42m
kube-system kube-proxy-v2jww 1/1 Running 0 42m
kube-system kube-scheduler-master 1/1 Running 0 46m
rook-ceph-system rook-ceph-agent-8pbtv 1/1 Running 0 26m
rook-ceph-system rook-ceph-agent-hsn27 1/1 Running 0 26m
rook-ceph-system rook-ceph-agent-qjqqx 1/1 Running 0 26m
rook-ceph-system rook-ceph-operator-d97564799-9szvr 1/1 Running 0 27m
rook-ceph-system rook-discover-26g84 1/1 Running 0 26m
rook-ceph-system rook-discover-hf7lc 1/1 Running 0 26m
rook-ceph-system rook-discover-jc72g 1/1 Running 0 26m
rook-ceph rook-ceph-mgr-a-68cb58b456-9rrj7 1/1 Running 0 21m
rook-ceph rook-ceph-mon-a-6469b4c68f-cq6mj 1/1 Running 0 23m
rook-ceph rook-ceph-mon-b-d59cfd758-2d2zt 1/1 Running 0 22m
rook-ceph rook-ceph-mon-c-79664b789-wl4t4 1/1 Running 0 21m
rook-ceph rook-ceph-osd-0-8778dbbc-d84mh 1/1 Running 0 19m
rook-ceph rook-ceph-osd-1-84974b86f6-z5c6c 1/1 Running 0 19m
rook-ceph rook-ceph-osd-2-84f9b78587-czx6d 1/1 Running 0 19m
rook-ceph rook-ceph-osd-prepare-worker1-x4rqc 0/2 Completed 0 20m
rook-ceph rook-ceph-osd-prepare-worker2-29jpg 0/2 Completed 0 20m
rook-ceph rook-ceph-osd-prepare-worker3-rkp52 0/2 Completed 0 20m
You are using a standard class storage for your PVC, and your policy will be ReadWriteOnce. This does not mean you can only connect your PVC to one pod, but only to one node.
ReadWriteOnce – the volume can be mounted as read-write by a single node
ReadWriteMany – the volume can be mounted as read-write by many nodes
Here, seems like you have 2 pods trying to mount this volume. This will be flakey unless you do one of two things:
Schedule both pods on the same node
Use other storageClasses such as NFS (FileSystem) to change policy to ReadWriteMany
Downscale to 1 pod, so you don't have to share the volume
Right now you have 2 pods trying to mount the same volume, default/wordpress-mysql-b78774f44-gvr58 and default/wordpress-mysql-b78774f44-bjc2c.
You can also downscale to 1 pod, so you don't have to worry about any of the above altogether:
kubectl scale deploy wordpress-mysql --replicas=1

AWS Kubernetes cluster using KOPS - Kube-dns and Kube-proxy goes down

I have created a kubernetes cluster using KOPS on AWS cloud. The cluster gets created without any issues and runs fine for 10-15 hrs. I have deployed SAP Vora2.1 on this cluster. However generally after 12-15 hrs the KOPS cluster gets into problems related to kube-proxy and kube-dns. These pods either goes down or shows in a completed state. There is lot of restart as well. This eventually results into my application pods getting into problems and application also goes down. the application uses consul for service discovery however as kubernetes foundation services are not working properly so application does not comes to steady state even if I try to restore kube-proxy/kube-dns pods.
This is a 3 node cluster (1 master and 2 nodes) set up in a fully autoscaling mode. The overlay network is using default kubenet. Below is snapshot of pod statuses once system runs into issues,
[root#ip-172-31-18-162 ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
infyvora vora-catalog-1549734119-cfnhz 0/2 CrashLoopBackOff 188 20h
infyvora vora-consul-0 0/1 CrashLoopBackOff 101 20h
infyvora vora-consul-1 1/1 Running 34 20h
infyvora vora-consul-2 0/1 CrashLoopBackOff 95 20h
infyvora vora-deployment-operator-293895365-4b3t6 0/1 Completed 104 20h
infyvora vora-disk-0 1/2 CrashLoopBackOff 187 20h
infyvora vora-dlog-0 0/2 CrashLoopBackOff 226 20h
infyvora vora-dlog-1 1/2 CrashLoopBackOff 155 20h
infyvora vora-doc-store-2451237348-dkrm6 0/2 CrashLoopBackOff 229 20h
infyvora vora-elasticsearch-logging-v1-444540252-mwfrz 0/1 CrashLoopBackOff 100 20h
infyvora vora-elasticsearch-logging-v1-444540252-vrr63 1/1 Running 14 20h
infyvora vora-elasticsearch-retention-policy-137762458-ns5pc 1/1 Running 13 20h
infyvora vora-fluentd-kubernetes-v1.21-9f4pt 1/1 Running 12 20h
infyvora vora-fluentd-kubernetes-v1.21-s2t1j 0/1 CrashLoopBackOff 99 20h
infyvora vora-grafana-2929546178-vrf5h 1/1 Running 13 20h
infyvora vora-graph-435594712-47lcg 0/2 CrashLoopBackOff 157 20h
infyvora vora-kibana-logging-3693794794-2qn86 0/1 CrashLoopBackOff 99 20h
infyvora vora-landscape-2532068267-w1f5n 0/2 CrashLoopBackOff 232 20h
infyvora vora-nats-streaming-1569990702-kcl1v 1/1 Running 13 20h
infyvora vora-prometheus-node-exporter-k4c3g 0/1 CrashLoopBackOff 102 20h
infyvora vora-prometheus-node-exporter-xp511 1/1 Running 13 20h
infyvora vora-prometheus-pushgateway-399610745-tcfk7 0/1 CrashLoopBackOff 103 20h
infyvora vora-prometheus-server-3955170982-xpct0 2/2 Running 24 20h
infyvora vora-relational-376953862-w39tc 0/2 CrashLoopBackOff 237 20h
infyvora vora-security-operator-2514524099-7ld0k 0/1 CrashLoopBackOff 103 20h
infyvora vora-thriftserver-409431919-8c1x9 2/2 Running 28 20h
infyvora vora-time-series-1188816986-f2fbq 1/2 CrashLoopBackOff 184 20h
infyvora vora-tools5tlpt-100252330-mrr9k 0/1 rpc error: code = 4 desc = context deadline exceeded 272 17h
infyvora vora-tools6zr3m-3592177467-n7sxd 0/1 Completed 1 20h
infyvora vora-tx-broker-4168728922-hf8jz 0/2 CrashLoopBackOff 151 20h
infyvora vora-tx-coordinator-3910571185-l0r4n 0/2 CrashLoopBackOff 184 20h
infyvora vora-tx-lock-manager-2734670982-bn7kk 0/2 Completed 228 20h
infyvora vsystem-1230763370-5ckr0 0/1 CrashLoopBackOff 115 20h
infyvora vsystem-auth-1068224543-0g59w 0/1 CrashLoopBackOff 102 20h
infyvora vsystem-vrep-1427606801-zprlr 0/1 CrashLoopBackOff 121 20h
kube-system dns-controller-3110272648-chwrs 1/1 Running 0 22h
kube-system etcd-server-events-ip-172-31-64-102.ap-southeast-1.compute.internal 1/1 Running 0 22h
kube-system etcd-server-ip-172-31-64-102.ap-southeast-1.compute.internal 1/1 Running 0 22h
kube-system kube-apiserver-ip-172-31-64-102.ap-southeast-1.compute.internal 1/1 Running 0 22h
kube-system kube-controller-manager-ip-172-31-64-102.ap-southeast-1.compute.internal 1/1 Running 0 22h
kube-system kube-dns-1311260920-cm1fs 0/3 Completed 309 22h
kube-system kube-dns-1311260920-hm5zd 3/3 Running 39 22h
kube-system kube-dns-autoscaler-1818915203-wmztj 1/1 Running 12 22h
kube-system kube-proxy-ip-172-31-64-102.ap-southeast-1.compute.internal 1/1 Running 0 22h
kube-system kube-proxy-ip-172-31-64-110.ap-southeast-1.compute.internal 0/1 CrashLoopBackOff 98 22h
kube-system kube-proxy-ip-172-31-64-15.ap-southeast-1.compute.internal 1/1 Running 13 22h
kube-system kube-scheduler-ip-172-31-64-102.ap-southeast-1.compute.internal 1/1 Running 0 22h
kube-system tiller-deploy-352283156-97hhb 1/1 Running 34 22h
Has anyone come across similar issue related to KOPS kubernetes on AWS. Appreciate if any pointers to solve this issue.
Regards,
Deepak