kubectl logs returns nothing (blank) - kubernetes

kubectl logs web-deployment-76789f7f64-s2b4r
returns nothing! The console prompt returns without error.
I have a pod which is in a CrashLoopbackOff cycle (but am unable to diagnose it) -->
web-deployment-7f985968dc-rhx52 0/1 CrashLoopBackOff 6 7m
I am using Azure AKS with kubectl on Windows. I have been running this cluster for a few months without probs. The container runs fine on my workstation with docker-compose.
kubectl describe doesn't really help much - no useful information there.
kubectl describe pod web-deployment-76789f7f64-s2b4r
Name: web-deployment-76789f7f64-j6z5h
Namespace: default
Node: aks-nodepool1-35657602-0/10.240.0.4
Start Time: Thu, 10 Jan 2019 18:58:35 +0000
Labels: app=stweb
pod-template-hash=3234593920
Annotations: <none>
Status: Running
IP: 10.244.0.25
Controlled By: ReplicaSet/web-deployment-76789f7f64
Containers:
stweb:
Container ID: docker://d1e184a49931bd01804ace51cb44bb4e3479786ec0df6e406546bfb27ab84e31
Image: virasana/stwebapi:2.0.20190110.20
Image ID: docker-pullable://virasana/stwebapi#sha256:2a1405f30c358f1b2a2579c5f3cc19b7d3cc8e19e9e6dc0061bebb732a05d394
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 10 Jan 2019 18:59:27 +0000
Finished: Thu, 10 Jan 2019 18:59:27 +0000
Ready: False
Restart Count: 3
Environment:
SUPPORT_TICKET_DEPLOY_DB_CONN_STRING_AUTH: <set to the key 'SUPPORT_TICKET_DEPLOY_DB_CONN_STRING_AUTH' in secret 'mssql'> Optional: false
SUPPORT_TICKET_DEPLOY_DB_CONN_STRING: <set to the key 'SUPPORT_TICKET_DEPLOY_DB_CONN_STRING' in secret 'mssql'> Optional: false
SUPPORT_TICKET_DEPLOY_JWT_SECRET: <set to the key 'SUPPORT_TICKET_DEPLOY_JWT_SECRET' in secret 'mssql'> Optional: false
KUBERNETES_PORT_443_TCP_ADDR: kscluster-rgksk8s-2cfe9c-8af10e3f.hcp.eastus.azmk8s.io
KUBERNETES_PORT: tcp://kscluster-rgksk8s-2cfe9c-8af10e3f.hcp.eastus.azmk8s.io:443
KUBERNETES_PORT_443_TCP: tcp://kscluster-rgksk8s-2cfe9c-8af10e3f.hcp.eastus.azmk8s.io:443
KUBERNETES_SERVICE_HOST: kscluster-rgksk8s-2cfe9c-8af10e3f.hcp.eastus.azmk8s.io
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-98c7q (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-98c7q:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-98c7q
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1m default-scheduler Successfully assigned web-deployment-76789f7f64-j6z5h to aks-nodepool1-35657602-0
Normal SuccessfulMountVolume 1m kubelet, aks-nodepool1-35657602-0 MountVolume.SetUp succeeded for volume "default-token-98c7q"
Normal Pulled 24s (x4 over 1m) kubelet, aks-nodepool1-35657602-0 Container image "virasana/stwebapi:2.0.20190110.20" already present on machine
Normal Created 22s (x4 over 1m) kubelet, aks-nodepool1-35657602-0 Created container
Normal Started 22s (x4 over 1m) kubelet, aks-nodepool1-35657602-0 Started container
Warning BackOff 7s (x6 over 1m) kubelet, aks-nodepool1-35657602-0 Back-off restarting failed container
Any ideas on how to proceed?
Many Thanks!

I am using a multi-stage docker build, and was building using the wrong target! (I had cloned a previous Visual Studio docker build task, which had the following argument:
--target=test
Because the "test" build stage has no defined entry point, the container was launching and then exiting without logging anything! So that's why kubectl logs returned blank.
I changed this to
--target=final
and all is working!
My Dockerfile looks like this:
FROM microsoft/dotnet:2.1-aspnetcore-runtime AS base
WORKDIR /app
EXPOSE 80
FROM microsoft/dotnet:2.1-sdk AS build
WORKDIR /src
COPY . .
WORKDIR "/src"
RUN dotnet clean ./ST.Web/ST.Web.csproj
RUN dotnet build ./ST.Web/ST.Web.csproj -c Release -o /app
FROM build AS test
RUN dotnet tool install -g dotnet-reportgenerator-globaltool
RUN chmod 755 ./run-tests.sh && ./run-tests.sh
FROM build AS publish
RUN dotnet publish ./ST.Web/ST.Web.csproj -c Release -o /app
FROM base AS final
WORKDIR /app
COPY --from=publish /app .
ENTRYPOINT ["dotnet", "ST.Web.dll"]

that happens because pod is already destroyed, try doing:
kubectl logs web-deployment-76789f7f64-s2b4r --previous
this will show logs from the previous pod.

For me, I had published a wrong dll on my docker file. After correcting it, everything is working as expected.

Related

kubernetes pod (mssql-tools) failing with CrashLoopBackOff error and restarting

I'm using Rancher Dekstop for K8 in WSL 2 in Windows 11.
I'm trying to create a pod using the simple yaml:
apiVersion: v1
kind: Pod
metadata:
name: mssql-tools
labels:
name: mssql-tools
spec:
containers:
- name: mssql-tools
image: mcr.microsoft.com/mssql-tools:latest
But it is continuously giving CrashLoopBackOff error.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
mssql-tools 0/1 CrashLoopBackOff 11 (8s ago) 14m
And here is the result of kubectl describe pod mssql-tool:
$ kubectl describe pod mssql-tools
Name: mssql-tools
Namespace: default
Priority: 0
Service Account: default
Node: desktop-2ohsprk/172.22.97.204
Start Time: Mon, 26 Dec 2022 04:34:19 +0500
Labels: name=mssql-tools
Annotations: <none>
Status: Running
IP: 10.42.0.57
IPs:
IP: 10.42.0.57
Containers:
mssql-tools:
Container ID: docker://76343010f4344a5d26fb35f3b0278271d3336e8e10d695cc22e78520262f34bf
Image: mcr.microsoft.com/mssql-tools:latest
Image ID: docker-pullable://mcr.microsoft.com/mssql-tools#sha256:62556500522072535cb3df2bb5965333dded9be47000473e9e0f84118e248642
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 26 Dec 2022 04:46:20 +0500
Finished: Mon, 26 Dec 2022 04:46:20 +0500
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 26 Dec 2022 04:45:51 +0500
Finished: Mon, 26 Dec 2022 04:45:51 +0500
Ready: False
Restart Count: 9
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wkqlg (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-wkqlg:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12m default-scheduler Successfully assigned default/mssql-tools to desktop-2ohsprk
Normal Pulled 12m kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 1.459473213s
Normal Pulled 12m kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 823.403008ms
Normal Pulled 11m kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 835.697509ms
Normal Pulled 11m kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 873.802598ms
Normal Created 11m (x4 over 12m) kubelet Created container mssql-tools
Normal Started 11m (x4 over 12m) kubelet Started container mssql-tools
Normal Pulling 10m (x5 over 12m) kubelet Pulling image "mcr.microsoft.com/mssql-tools:latest"
Normal Pulled 10m kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 740.64559ms
Warning BackOff 6m56s (x25 over 11m) kubelet Back-off restarting failed container
Normal SandboxChanged 50s kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 48s kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 951.332457ms
Normal Pulled 32s kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 828.839917ms
Normal Pulling 4s (x3 over 49s) kubelet Pulling image "mcr.microsoft.com/mssql-tools:latest"
Normal Pulled 3s kubelet Successfully pulled image "mcr.microsoft.com/mssql-tools:latest" in 713.951656ms
Normal Created 3s (x3 over 48s) kubelet Created container mssql-tools
Normal Started 3s (x3 over 48s) kubelet Started container mssql-tools
Warning BackOff 2s (x5 over 47s) kubelet Back-off restarting failed container
The same container works perfectly if I run it via docker and I can use its shell to execute sqlcmd properly.
I can't figure out any reason for this.
Any help would be really appreciated.
Thanks
Crashloopbackoff is the common error which indicates that pod failed to start and it continued to fail repeatedly when kubernetes tried to restart this.
To troubleshoot this issue follow the below steps:
Check for “Back off Restarting Failed Container” by running the command Run kubectl describe pod [name].
If you get a Liveness probe failed and Back-off restarting failed container messages from the kubelet, this indicates the container is not responding and is in the process of restarting.
Check from the previous container instance. Run kubectl get pods to identify the Kubernetes pod that causes CrashLoopBackOff error. You can run kubectl logs --previous --tail 10command to get the last ten log lines from the pod.
Check deployment logs by running the command: kubectl logs -f deploy/ -n
Refer to this link for more detailed troubleshooting steps.
So after trying and digging through multiple options, finally it worked by executing the command sleep 3600000 i.e. delaying it so that the pod initializes itself properly and then executes the container.
Here is the working yaml:
apiVersion: v1
kind: Pod
metadata:
name: mssql-tools
labels:
name: mssql-tools
spec:
containers:
- name: mssql-tools
image: mcr.microsoft.com/mssql-tools:latest
command: ["sleep"]
args:
- "3600000"
imagePullPolicy: IfNotPresent
The command and argument passing portion can also be mentioned like the following:
apiVersion: v1
...
...
spec:
containers:
- name: mssql-tools
image: mcr.microsoft.com/mssql-tools:latest
command:
- sleep
- "3600000"
...
and btw, you can also deploy a container by passing a command with the kubectl run command line: i.e.
kubectl run mssql --image=mcr.microsoft.com/mssql-tools --command sleep 3600000 -n myNameSpace
Note: You can omit -n myNameSpace if you are not deploying it in a specific namespace or deploying it in the default namespace.

why the kubernetes pod shows Back-off restarting failed container [duplicate]

This question already has answers here:
How can I keep a container running on Kubernetes?
(14 answers)
My kubernetes pods keep crashing with "CrashLoopBackOff" but I can't find any log
(21 answers)
Closed 2 years ago.
I want to build a troube shooting pod, this is my Dockerbuild file:
FROM alpine:3.11
MAINTAINER jiangxiaoqiang (jiangtingqiang#gmail.com)
ENV LANG=en_US.UTF-8 \
LC_ALL=en_US.UTF-8 \
TZ=Asia/Shanghai
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime \
&& echo $TZ > /etc/timezone \
&& apk add --no-cache curl jq \
nmap \
bind-tools \
busybox-extras \
bash
CMD ["/bin/bash","-l"]
but when I start it in kubernetes cluster, it shows: Back-off restarting failed container, and always restart all the time. so simple docker container ,why give me this tips? this is the descibe output:
[root#k8smaster ~]# kubectl describe pod ts-7d754488b9-jqqh9
Name: ts-7d754488b9-jqqh9
Namespace: default
Priority: 0
Node: k8sslave2/192.168.31.31
Start Time: Wed, 02 Sep 2020 12:28:48 -0400
Labels: k8s-app=ts
pod-template-hash=7d754488b9
Annotations: cni.projectcalico.org/podIP: 10.11.125.135/32
Status: Running
IP: 10.11.125.135
IPs:
IP: 10.11.125.135
Controlled By: ReplicaSet/ts-7d754488b9
Containers:
ts:
Container ID: docker://0c810ed8f8ec1cde6c0249edde59fc28a169d5730e87c423403f802cd12df6dd
Image: registry.cn-shanghai.aliyuncs.com/jiangxiaoqiang/dolphin/k8s-ts:v0.0.1
Image ID: docker-pullable://registry.cn-shanghai.aliyuncs.com/jiangxiaoqiang/dolphin/k8s-ts#sha256:68edaed45c1fadee71abbe7bdaad23f2400f352f1b6309142689a197367f3ae9
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 02 Sep 2020 12:30:13 -0400
Finished: Wed, 02 Sep 2020 12:30:13 -0400
Ready: False
Restart Count: 4
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-79w95 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-79w95:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-79w95
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ts-7d754488b9-jqqh9 to k8sslave2
Normal Created 96s (x4 over 2m17s) kubelet, k8sslave2 Created container ts
Normal Started 95s (x4 over 2m16s) kubelet, k8sslave2 Started container ts
Warning BackOff 69s (x7 over 2m15s) kubelet, k8sslave2 Back-off restarting failed container
Normal Pulling 54s (x5 over 2m17s) kubelet, k8sslave2 Pulling image "registry.cn-shanghai.aliyuncs.com/jiangxiaoqiang/dolphin/k8s-ts:v0.0.1"
Normal Pulled 54s (x5 over 2m17s) kubelet, k8sslave2 Successfully pulled image "registry.cn-shanghai.aliyuncs.com/jiangxiaoqiang/dolphin/k8s-ts:v0.0.1"
The container is completed means it is finished it's execution task. If you wish the container should run for specific time then pass eg . sleep 3600 as argument or you can use restartPolicy: Never in your deployment file.
something like this
spec:
containers:
- image: alpine
command:
- /bin/sh
- "-c"
- "sleep 60m"
imagePullPolicy: Always
restartPolicy: Never
name: alpine

GCP AI Platform - Pipelines - Clusters - Does not have minimum availability

I can't create pipelines. I can't even load the samples / tutorials on the AI Platform Pipelines Dashboard because it doesn't seem to be able to proxy to whatever it needs to.
An error occurred
Error occured while trying to proxy to: ...
I looked into the cluster's details and found 3 components with errors:
Deployment metadata-grpc-deployment Does not have minimum availability
Deployment ml-pipeline Does not have minimum availability
Deployment ml-pipeline-persistenceagent Does not have minimum availability
Creating the clusters involve approx. 3 clicks in GCP Kubernetes Engine so I don't think I messed up this step.
Anyone have an idea of how to achieve "minimum availability"?
UPDATE 1
Nodes have adequate resources and are Ready.
YAML file looks good.
I have 2 clusters in diff regions/zones and both have the deployment errors listed above.
2 Pods are not ok.
Name: ml-pipeline-65479485c8-mcj9x
Namespace: default
Priority: 0
Node: gke-cluster-3-default-pool-007784cb-qcsn/10.150.0.2
Start Time: Thu, 17 Sep 2020 22:15:19 +0000
Labels: app=ml-pipeline
app.kubernetes.io/name=kubeflow-pipelines-3
pod-template-hash=65479485c8
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container ml-pipeline-api-server
Status: Running
IP: 10.4.0.8
IPs:
IP: 10.4.0.8
Controlled By: ReplicaSet/ml-pipeline-65479485c8
Containers:
ml-pipeline-api-server:
Container ID: ...
Image: ...
Image ID: ...
Ports: 8888/TCP, 8887/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Fri, 18 Sep 2020 10:27:31 +0000
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Fri, 18 Sep 2020 10:20:38 +0000
Finished: Fri, 18 Sep 2020 10:27:31 +0000
Ready: False
Restart Count: 98
Requests:
cpu: 100m
Liveness: exec [wget -q -S -O - http://localhost:8888/apis/v1beta1/healthz] delay=3s timeout=2s period=5s #success=1 #failure=3
Readiness: exec [wget -q -S -O - http://localhost:8888/apis/v1beta1/healthz] delay=3s timeout=2s period=5s #success=1 #failure=3
Environment:
HAS_DEFAULT_BUCKET: true
BUCKET_NAME:
PROJECT_ID: <set to the key 'project_id' of config map 'gcp-default-config'> Optional: false
POD_NAMESPACE: default (v1:metadata.namespace)
DEFAULTPIPELINERUNNERSERVICEACCOUNT: pipeline-runner
OBJECTSTORECONFIG_SECURE: false
OBJECTSTORECONFIG_BUCKETNAME:
DBCONFIG_DBNAME: kubeflow_pipelines_3_pipeline
DBCONFIG_USER: <set to the key 'username' in secret 'mysql-credential'> Optional: false
DBCONFIG_PASSWORD: <set to the key 'password' in secret 'mysql-credential'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from ml-pipeline-token-77xl8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
ml-pipeline-token-77xl8:
Type: Secret (a volume populated by a Secret)
SecretName: ml-pipeline-token-77xl8
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 52m (x409 over 11h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Back-off restarting failed container
Warning Unhealthy 31m (x94 over 12h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Readiness probe failed:
Warning Unhealthy 31m (x29 over 10h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn (combined from similar events): Readiness probe failed: c
annot exec in a stopped state: unknown
Warning Unhealthy 17m (x95 over 12h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Liveness probe failed:
Normal Pulled 7m26s (x97 over 12h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Container image "gcr.io/cloud-marketplace/google-cloud-ai
-platform/kubeflow-pipelines/apiserver:1.0.0" already present on machine
Warning Unhealthy 75s (x78 over 12h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Liveness probe errored: rpc error: code = DeadlineExceede
d desc = context deadline exceeded
And the other pod:
Name: ml-pipeline-persistenceagent-67db8b8964-mlbmv
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 32s (x2238 over 12h) kubelet, gke-cluster-3-default-pool-007784cb-qcsn Back-off restarting failed container
SOLUTION
Do not let google handle any storage. Uncheck "Use managed storage" and set up your own artifact collections manually. You don't actually need to enter anything in these fields since the pipeline will be launched anyway.
The Does not have minimum availability error is generic. There could be many issues that trigger it. You need to analyse more in-depth in order to find the actual problem. Here are some possible causes:
Insufficient resources: check if your Node has adequate resources (CPU/Memory). If Node is ok than check the Pod's status.
Liveliness probe and/or Readiness probe failure: execute kubectl describe pod <pod-name> to check if they failed and why.
Deployment misconfiguration: review your deployment yaml file to see if there are any errors or leftovers from previous configurations.
You can also try to wait a bit as sometimes it takes some time in order to deploy everything and/or try changing your Region/Zone.

CockroachDB Cluster on Kubernetes Pods Crashing

I'm trying to install a CockroachDB Helm chart on a 2 node Kubernetes cluster using this command:
helm install my-release --set statefulset.replicas=2 stable/cockroachdb
I have already created 2 persistent volumes:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv00001 100Gi RWO Recycle Bound default/datadir-my-release-cockroachdb-0 11m
pv00002 100Gi RWO Recycle Bound default/datadir-my-release-cockroachdb-1 11m
I'm getting a weird error and I'm new to Kubernetes so I'm not sure what I'm doing wrong. I've tried creating a StorageClass and using it with my PVs but then the CockroachDB PVCs won't bind to them. I suspect there may be something wrong with my PV setup?
I've tried using kubectl logs but the only error I'm seeing is this:
standard_init_linux.go:211: exec user process caused "exec format
error"
and the pods are crashing over and over:
NAME READY STATUS RESTARTS AGE
my-release-cockroachdb-0 0/1 Pending 0 11m
my-release-cockroachdb-1 0/1 CrashLoopBackOff 7 11m
my-release-cockroachdb-init-tfcks 0/1 CrashLoopBackOff 5 5m29s
Any idea why the pods are crashing?
Here's kubectl describe for the init pod:
Name: my-release-cockroachdb-init-tfcks
Namespace: default
Priority: 0
Node: axon/192.168.1.7
Start Time: Sat, 04 Apr 2020 00:22:19 +0100
Labels: app.kubernetes.io/component=init
app.kubernetes.io/instance=my-release
app.kubernetes.io/name=cockroachdb
controller-uid=54c7c15d-eb1c-4392-930a-d9b8e9225a45
job-name=my-release-cockroachdb-init
Annotations: <none>
Status: Running
IP: 10.44.0.1
IPs:
IP: 10.44.0.1
Controlled By: Job/my-release-cockroachdb-init
Containers:
cluster-init:
Container ID: docker://82a062c6862a9fd5047236feafe6e2654ec1f6e3064fd0513341a1e7f36eaed3
Image: cockroachdb/cockroach:v19.2.4
Image ID: docker-pullable://cockroachdb/cockroach#sha256:511b6d09d5bc42c7566477811a4e774d85d5689f8ba7a87a114b96d115b6149b
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
while true; do initOUT=$(set -x; /cockroach/cockroach init --insecure --host=my-release-cockroachdb-0.my-release-cockroachdb:26257 2>&1); initRC="$?"; echo $initOUT; [[ "$initRC" == "0" ]] && exit 0; [[ "$initOUT" == *"cluster has already been initialized"* ]] && exit 0; sleep 5; done
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sat, 04 Apr 2020 00:28:04 +0100
Finished: Sat, 04 Apr 2020 00:28:04 +0100
Ready: False
Restart Count: 6
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-cz2sn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-cz2sn:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-cz2sn
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/my-release-cockroachdb-init-tfcks to axon
Normal Pulled 5m9s (x5 over 6m45s) kubelet, axon Container image "cockroachdb/cockroach:v19.2.4" already present on machine
Normal Created 5m8s (x5 over 6m45s) kubelet, axon Created container cluster-init
Normal Started 5m8s (x5 over 6m44s) kubelet, axon Started container cluster-init
Warning BackOff 92s (x26 over 6m42s) kubelet, axon Back-off restarting failed container
When Pods get crashed, the most important thing to troubleshoot is their descriptions(kubectl describe) and logs.
Logs of the failed Pod show that the arch of the cockroach image doesn't match to the nodes.
Run kubectl get po -o wide to get nodes where cockroach runs and check their arch.
A 2-node CockroachDB cluster is an anti-pattern. You need 3 or more nodes to avoid data or cluster-wide unavailability when a single node fails. Consider checking out these videos explaining how data in CockroachDB is organized and then how the nodes in a cluster work together to keep data available in the face of node failure.
Only if you have 3 nodes (or more), you will not risk losing data if any of the notes gets corrupted. Apart from it, its easier to explain how to do it right, than finding out what went wrong, and to find out what went wrong, one must go through the logs.
If you attach the log, I can take a look.
I also wrote a detailed guide that may address the "doing it right" part of my answer. I elaborated even more about the entire process here.

Minikube running out of space and failing despite --disk-size flag

I am trying to run a docker container registry in Minikube for testing a CSI driver that I am writing.
I am running minikube on mac and am trying to use the following minikube start command: minikube start --vm-driver=hyperkit --disk-size=40g. I have tried with both kubeadm and localkube bootstrappers and with the virtualbox vm-driver.
This is the resource definition I am using for the registry pod deployment.
---
apiVersion: v1
kind: Pod
metadata:
name: registry
labels:
app: registry
namespace: docker-registry
spec:
containers:
- name: registry
image: registry:2
imagePullPolicy: Always
ports:
- containerPort: 5000
volumeMounts:
- mountPath: /var/lib/registry
name: registry-data
volumes:
- hostPath:
path: /var/lib/kubelet/plugins/csi-registry
type: DirectoryOrCreate
name: registry-data
I attempt to create it using kubectl apply -f registry-setup.yaml. Before running this my minikube cluster reports itself as ready and with all the normal minikube containers running.
However, this fails to run and upon running kubectl describe pod, I see the following message:
Name: registry
Namespace: docker-registry
Node: minikube/192.168.64.43
Start Time: Wed, 08 Aug 2018 12:24:27 -0700
Labels: app=registry
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"app":"registry"},"name":"registry","namespace":"docker-registry"},"spec":{"cont...
Status: Running
IP: 172.17.0.2
Containers:
registry:
Container ID: docker://42e5193ac563c2b2e2a2b381c91350d30f7e7c5009a30a5977d33b403a374e7f
Image: registry:2
...
TRUNCATED FOR SPACE
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1m default-scheduler Successfully assigned registry to minikube
Normal SuccessfulMountVolume 1m kubelet, minikube MountVolume.SetUp succeeded for volume "registry-data"
Normal SuccessfulMountVolume 1m kubelet, minikube MountVolume.SetUp succeeded for volume "default-token-kq5mq"
Normal Pulling 1m kubelet, minikube pulling image "registry:2"
Normal Pulled 1m kubelet, minikube Successfully pulled image "registry:2"
Normal Created 1m kubelet, minikube Created container
Normal Started 1m kubelet, minikube Started container
...
TRUNCATED
...
Name: storage-provisioner
Namespace: kube-system
Node: minikube/192.168.64.43
Start Time: Wed, 08 Aug 2018 12:24:38 -0700
Labels: addonmanager.kubernetes.io/mode=Reconcile
integration-test=storage-provisioner
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile","integration-test":"storage-provis...
Status: Pending
IP: 192.168.64.43
Containers:
storage-provisioner:
Container ID:
Image: gcr.io/k8s-minikube/storage-provisioner:v1.8.1
Image ID:
Port: <none>
Host Port: <none>
Command:
/storage-provisioner
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from storage-provisioner-token-sb5hz (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
tmp:
Type: HostPath (bare host directory volume)
Path: /tmp
HostPathType: Directory
storage-provisioner-token-sb5hz:
Type: Secret (a volume populated by a Secret)
SecretName: storage-provisioner-token-sb5hz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1m default-scheduler Successfully assigned storage-provisioner to minikube
Normal SuccessfulMountVolume 1m kubelet, minikube MountVolume.SetUp succeeded for volume "tmp"
Normal SuccessfulMountVolume 1m kubelet, minikube MountVolume.SetUp succeeded for volume "storage-provisioner-token-sb5hz"
Normal Pulling 23s (x3 over 1m) kubelet, minikube pulling image "gcr.io/k8s-minikube/storage-provisioner:v1.8.1"
Warning Failed 21s (x3 over 1m) kubelet, minikube Failed to pull image "gcr.io/k8s-minikube/storage-provisioner:v1.8.1": rpc error: code = Unknown desc = failed to register layer: Error processing tar file(exit status 1): write /storage-provisioner: no space left on device
Warning Failed 21s (x3 over 1m) kubelet, minikube Error: ErrImagePull
Normal BackOff 7s (x3 over 1m) kubelet, minikube Back-off pulling image "gcr.io/k8s-minikube/storage-provisioner:v1.8.1"
Warning Failed 7s (x3 over 1m) kubelet, minikube Error: ImagePullBackOff
------------------------------------------------------------
...
So while the registry container starts up correctly, a few of the other minikube services (including dns, http ingress service, etc) begin to fail with reasons such as the following: write /storage-provisioner: no space left on device. Despite allocating a 40GB disk-size to minikube, it seems as though minikube is trying to write to rootfs or devtempfs (depending on the vm-driver) which has only 1GB of space.
$ df -h
Filesystem Size Used Avail Use% Mounted on
rootfs 919M 713M 206M 78% /
devtmpfs 919M 0 919M 0% /dev
tmpfs 996M 0 996M 0% /dev/shm
tmpfs 996M 8.9M 987M 1% /run
tmpfs 996M 0 996M 0% /sys/fs/cgroup
tmpfs 996M 8.0K 996M 1% /tmp
/dev/sda1 34G 1.3G 30G 4% /mnt/sda1
Is there a way to make minikube actually use the 34GB of space that was allocated to /mnt/sda1 instead of rootfs when pulling images and creating containers?
Thanks in advance for any help!
You need to configure your Minikube virtual machine for using /dev/sda1 instead of / for Docker. To log in to it, use minikube ssh command.
Than you have two options:
Mount /dev/sda1 to var/lib/docker, but don't forget to copy the content from original var/lib/docker to /mnt/sda1 before that.
Reconfigure Docker for using /mnt/sda1 instead of var/lib/docker for storing images. Look through this link for more information about it.
You can also use the minikube --docker-opt option to set the --data-root option of the dockerd daemon running inside minikube. --docker-opt can be used as a pass-through for any parameter to dockerd.
For example, in the case you describe above it would look like:
minikube start --vm-driver=hyperkit --disk-size=40g --docker-opt="--data-root /mnt/sda1"
Keep in mind that if you try to modify an existing minikube cluster you either have to copy var/lib/docker to /mnt/sda1 (as the previous answer also suggested) before restarting or delete and rebuild the cluster.
update:
After experimentation, I noticed that the above solution will not work the first time you run minikube start as it somehow interferes with minikube's own core-system build and boot-up process.
In practice this means that you need to run minikube start at least once without the --docker-opt to build the core system and then re-run it with --docker-opt.