I am using minikube on Linux to get started with kubernetes. Going with the examples in the readme and going with the none vm-diver, I do the following.
$ minikube start --vm-driver=none
Starting local Kubernetes v1.9.0 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
Kubectl is now configured to use the cluster.
===================
WARNING: IT IS RECOMMENDED NOT TO RUN THE NONE DRIVER ON PERSONAL WORKSTATIONS
The 'none' driver will run an insecure kubernetes apiserver as root that may leave the host vulnerable to CSRF attacks
When using the none driver, the kubectl config and credentials generated will be root owned and will appear in the root home directory.
You will need to move the files to the appropriate location and then set the correct permissions. An example of this is below:
sudo mv /root/.kube $HOME/.kube # this will write over any previous configuration
sudo chown -R $USER $HOME/.kube
sudo chgrp -R $USER $HOME/.kube
sudo mv /root/.minikube $HOME/.minikube # this will write over any previous configuration
sudo chown -R $USER $HOME/.minikube
sudo chgrp -R $USER $HOME/.minikube
This can also be done automatically by setting the env var CHANGE_MINIKUBE_NONE_USER=true
Loading cached images from config file.
$ kubectl get nodes
No resources found.
$ kubectl run hello-minikube --image=k8s.gcr.io/echoserver:1.4 --port=8080
deployment "hello-minikube" created
$ kubectl expose deployment hello-minikube --type=NodePort
service "hello-minikube" exposed
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
hello-minikube-c6c6764d-h64t8 0/1 Pending 0 3m
Now, the problem is that this pod continues to remain pending. It looks like there are no nodes to run it on but I do not know why. Where am I going wrong?
EDIT: Here is the output of describeing the pod.
$ kubectl describe pod hello-minikube-c6c6764d-h64t8
Name: hello-minikube-c6c6764d-h64t8
Namespace: default
Node: <none>
Labels: pod-template-hash=72723208
run=hello-minikube
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/hello-minikube-c6c6764d
Containers:
hello-minikube:
Image: k8s.gcr.io/echoserver:1.4
Port: 8080/TCP
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-dw4j7 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-dw4j7:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-dw4j7
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 20h (x4 over 20h) default-scheduler no nodes available to schedule pods
Warning FailedScheduling 20h (x4 over 20h) default-scheduler no nodes available to schedule pods
Warning FailedScheduling 20h (x5 over 20h) default-scheduler no nodes available to schedule pods
Warning FailedScheduling 19h (x5 over 19h) default-scheduler no nodes available to schedule pods
Warning FailedScheduling 19h (x5 over 19h) default-scheduler no nodes available to schedule pods
Warning FailedScheduling 18h (x5 over 18h) default-scheduler no nodes available to schedule pods
Warning FailedScheduling 1s (x5 over 16s) default-scheduler no nodes available to schedule pods
try running minikube status if you get this then try running this command.
fix worked for me:
sudo chown -R $USER /home/docker/.minikube/machines/minikube/config.json
Error with minikube status:
"X Error getting host status: load: filestore: open /home/docker/.minikube/machines/minikube/config.json: permission denied
*
* Sorry that minikube crashed. If this was unexpected, we would love to hear from you:
- https://github.com/kubernetes/minikube/issues/new/choose"
Related
I'm using Kubernetes version: 1.19.16 on bare metal Ubuntu-18.04lts server. When i tried to deploy the nginx-ingress yaml file it always fails with below errors.
Following steps followed to deploy nginx-ingress,
$ git clone https://github.com/nginxinc/kubernetes-ingress.git
cd kubernetes-ingress/deployments
kubernetes-ingress/deployments$ git branch
* main
$ kubectl apply -f common/ns-and-sa.yaml
$ kubectl apply -f rbac/rbac.yaml
$ kubectl apply -f rbac/ap-rbac.yaml
$ kubectl apply -f common/default-server-secret.yaml
$ kubectl apply -f common/nginx-config.yaml
$ kubectl apply -f deployment/nginx-ingress.yaml
deployment.apps/nginx-ingress created
$ kubectl get pods -n nginx-ingress -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-ingress-75c4bd64bd-mm52x 0/1 Error 2 21s 10.244.1.5 k8s-master <none> <none>
$ kubectl -n nginx-ingress get all
NAME READY STATUS RESTARTS AGE
pod/nginx-ingress-75c4bd64bd-mm52x 0/1 CrashLoopBackOff 12 38m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx-ingress 0/1 1 0 38m
NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-ingress-75c4bd64bd 1 1 0 38m
$ kubectl logs nginx-ingress-75c4bd64bd-mm52x -n nginx-ingress
W1003 04:53:02.833073 1 flags.go:273] Ignoring unhandled arguments: []
I1003 04:53:02.833154 1 flags.go:190] Starting NGINX Ingress Controller Version=2.3.1 PlusFlag=false
I1003 04:53:02.833158 1 flags.go:191] Commit=a8742472b9ddf27433b6b1de49d250aa9a7cb47e Date=2022-09-16T08:09:31Z DirtyState=false Arch=linux/amd64 Go=go1.18.5
I1003 04:53:02.844374 1 main.go:210] Kubernetes version: 1.19.16
F1003 04:53:02.846604 1 main.go:225] Error when getting IngressClass nginx: ingressclasses.networking.k8s.io "nginx" not found
$ kubectl describe pods nginx-ingress-75c4bd64bd-mm52x -n nginx-ingress
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m6s default-scheduler Successfully assigned nginx-ingress/nginx-ingress-75c4bd64bd-mm52x to k8s-worker-1
Normal Pulled 87s (x5 over 3m5s) kubelet Container image "nginx/nginx-ingress:2.3.1" already present on machine
Normal Created 87s (x5 over 3m5s) kubelet Created container nginx-ingress
Normal Started 87s (x5 over 3m5s) kubelet Started container nginx-ingress
Warning BackOff 75s (x10 over 3m3s) kubelet Back-off restarting failed container
Nginx Ingress controller Deployment file Link for the reference.
As I'm using kubernetes-ingress.git repository main branch, not sure whether main branch is compatible with my Kubernetes version or not.
Can anyone share some pointer to solve this?
I think you missed to install ingress-controller "NGINX" that is why it is not able to identify the same https://github.com/nginxinc/kubernetes-ingress/blob/main/deployments/common/ingress-class.yaml#L4
kubectl apply -f common/ingress-class.yaml
You can follow thie steps from this document: https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-manifests/
I have been playing around with minikube and after a set of operations, the output of kubectl get pod -w is like this-
nginx 1/1 Running 2 10m
nginx 1/1 Running 3 10m
nginx 0/1 Completed 2 10m
nginx 0/1 CrashLoopBackOff 2 11m
nginx 1/1 Running 3 11m
nginx 1/1 Running 3 12m
I don't understand the count shown at line 3 and 4. What does restart count convey exactly?
About the CrashLoopBackOff Status:
A CrashloopBackOff means that you have a pod starting, crashing, starting again, and then crashing again.
Failed containers that are restarted by the kubelet are restarted with an exponential back-off delay (10s, 20s, 40s …) capped at five minutes, and is reset after ten minutes of successful execution.
CrashLoopBackOff events occurs for different reasons, most of te cases related to the following:
- The application inside the container keeps crashing
- Some parameter of the pod or container have been configured incorrectly
- An error made during the deployment
Whenever you face a CrashLoopBackOff do a kubectl describe to investigate:
kubectl describe pod POD_NAME --namespace NAMESPACE_NAME
user#minikube:~$ kubectl describe pod ubuntu-5d4bb4fd84-8gl67 --namespace default
Name: ubuntu-5d4bb4fd84-8gl67
Namespace: default
Priority: 0
Node: minikube/192.168.39.216
Start Time: Thu, 09 Jan 2020 09:51:03 +0000
Labels: app=ubuntu
pod-template-hash=5d4bb4fd84
Status: Running
Controlled By: ReplicaSet/ubuntu-5d4bb4fd84
Containers:
ubuntu:
Container ID: docker://c4c0295e1e050b5e395fc7b368a8170f863159879821dd2562bc2938d17fc6fc
Image: ubuntu
Image ID: docker-pullable://ubuntu#sha256:250cc6f3f3ffc5cdaa9d8f4946ac79821aafb4d3afc93928f0de9336eba21aa4
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 09 Jan 2020 09:54:37 +0000
Finished: Thu, 09 Jan 2020 09:54:37 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 09 Jan 2020 09:53:05 +0000
Finished: Thu, 09 Jan 2020 09:53:05 +0000
Ready: False
Restart Count: 5
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-xxxst (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-xxxst:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-xxxst
Optional: false
QoS Class: BestEffort
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m16s default-scheduler Successfully assigned default/ubuntu-5d4bb4fd84-8gl67 to minikube
Normal Created 5m59s (x4 over 6m52s) kubelet, minikube Created container ubuntu
Normal Started 5m58s (x4 over 6m52s) kubelet, minikube Started container ubuntu
Normal Pulling 5m17s (x5 over 7m5s) kubelet, minikube Pulling image "ubuntu"
Normal Pulled 5m15s (x5 over 6m52s) kubelet, minikube Successfully pulled image "ubuntu"
Warning BackOff 2m2s (x24 over 6m43s) kubelet, minikube Back-off restarting failed container
The Events section will provide you with detailed explanation on what happened.
RestartCount represents the number of times the container inside a pod has been restarted, it is based on the number of dead containers that have not yet been removed. Note that this is calculated from dead containers.
-w on the command is for watch flag and various headers are as listed below
$ kubectl get pods -w
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 0 21m
To get detailed output use -o wide flag
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 1 21h 10.244.2.36 worker-node-2 <none> <none>
So the READY field represents the containers inside the pods and can be seen in detailed by describe pod command. Refer POD Lifecycle
$ kubectl describe pod nginx| grep -i -A6 "Conditions"
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
RESTARTS Field is tracked under Restart Count , grep it from pod description as below.
$ kubectl describe pod nginx | grep -i "Restart"
Restart Count: 0
So as a test we now try to restart the above container and see what field are updated.
We find the node where our container is running and kill the container from node using docker command and it should be restarted automatically by kubernetes
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 21h 10.244.2.36 worker-node-2 <none> <none>
ubuntu#worker-node-2:~$ sudo docker ps -a | grep -i nginx
4c8e2e6bf67c nginx "nginx -g 'daemon of…" 22 hours ago Up 22 hours
ubuntu#worker-node-2:~$ sudo docker kill 4c8e2e6bf67c
4c8e2e6bf67c
POD Status is changed to ERROR
READY count goes to 0/1
ubuntu#cluster-master:~$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 0/1 Error 0 21h 10.244.2.36 worker-node-2 <none> <none>
Once POD recovers the failed container.
READY count is 1/1 again
STATUS changes back to running
RESTARTS count is incremented by 1
ubuntu#cluster-master:~$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 1 21h 10.244.2.36 worker-node-2 <none> <none>
Check restart by describe command as well
$ kubectl describe pods nginx | grep -i "Restart"
Restart Count: 1
The values in your output are not inconsistent .. that is how the pod with a restartPolicy of Always will work it will try to bring back the failed container until CrashLoopBackOff limit is reached.
Refer POD State Examples
Pod is running and has one Container. Container exits with success.
Log completion event.
If restartPolicy is:
Always: Restart Container; Pod phase stays Running.
OnFailure: Pod phase becomes Succeeded.
Never: Pod phase becomes Succeeded.
List the Restarted pods accross all namespaces:
kubectl get pods -A |awk '$5 != "0" {print $0}'
I followed Istio's official documentation to setup Istio for sample bookinfo app with minikube. but I'm getting Unable to connect to the server: net/http: TLS handshake timeout error. these are the steps that I have followed(I have kubectl & minikube installed).
minikube start
curl -L https://git.io/getLatestIstio | sh -
cd istio-1.0.3
export PATH=$PWD/bin:$PATH
kubectl apply -f install/kubernetes/helm/istio/templates/crds.yaml
kubectl apply -f install/kubernetes/istio-demo-auth.yaml
kubectl get pods -n istio-system
This is the terminal output I'm getting
$ kubectl get pods -n istio-system
NAME READY STATUS RESTARTS AGE
grafana-9cfc9d4c9-xg7bh 1/1 Running 0 4m
istio-citadel-6d7f9c545b-lwq8s 1/1 Running 0 3m
istio-cleanup-secrets-69hdj 0/1 Completed 0 4m
istio-egressgateway-75dbb8f95d-k6xj2 1/1 Running 0 4m
istio-galley-6d74549bb9-mdc97 0/1 ContainerCreating 0 4m
istio-grafana-post-install-xz9rk 0/1 Completed 0 4m
istio-ingressgateway-6bd4957bc-vhbct 1/1 Running 0 4m
istio-pilot-7f8c49bbd8-x6bmm 0/2 Pending 0 4m
istio-policy-6c65d8cff4-hx2c7 2/2 Running 0 4m
istio-security-post-install-gjfj2 0/1 Completed 0 4m
istio-sidecar-injector-74855c54b9-nnqgx 0/1 ContainerCreating 0 3m
istio-telemetry-65cdd46d6c-rqzfw 2/2 Running 0 4m
istio-tracing-ff94688bb-hgz4h 1/1 Running 0 3m
prometheus-f556886b8-chdxw 1/1 Running 0 4m
servicegraph-778f94d6f8-9xgw5 1/1 Running 0 3m
$kubectl describe pod istio-galley-6d74549bb9-mdc97
Error from server (NotFound): pods "istio-galley-5bf4d6b8f7-8s2z9" not found
pod describe output
$ kubectl -n istio-system describe pod istio-galley-6d74549bb9-mdc97
Name: istio-galley-6d74549bb9-mdc97
Namespace: istio-system
Node: minikube/172.17.0.4
Start Time: Sat, 03 Nov 2018 04:29:57 +0000
Labels: istio=galley
pod-template-hash=1690826493
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
sidecar.istio.io/inject=false
Status: Pending
IP:
Controlled By: ReplicaSet/istio-galley-5bf4d6b8f7
Containers:
validator:
Container ID:
Image: gcr.io/istio-release/galley:1.0.0 Image ID:
Ports: 443/TCP, 9093/TCP Host Ports: 0/TCP, 0/TCP
Command: /usr/local/bin/galley
validator --deployment-namespace=istio-system
--caCertFile=/etc/istio/certs/root-cert.pem
--tlsCertFile=/etc/istio/certs/cert-chain.pem
--tlsKeyFile=/etc/istio/certs/key.pem
--healthCheckInterval=2s
--healthCheckFile=/health
--webhook-config-file
/etc/istio/config/validatingwebhookconfiguration.yaml
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 10m
Liveness: exec [/usr/local/bin/galley probe --probe-path=/health --interval=4s] delay=4s timeout=1s period=4s #success=1 #failure=3
Readiness: exec [/usr/local/bin/galley probe --probe-path=/health --interval=4s] delay=4s timeout=1s period=4s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/istio/certs from certs (ro)
/etc/istio/config from config (ro)
/var/run/secrets/kubernetes.io/serviceaccount from istio-galley-service-account-token-9pcmv(ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
certs:
Type: Secret (a volume populated by a Secret)
SecretName: istio.istio-galley-service-account
Optional: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: istio-galley-configuration
Optional: false
istio-galley-service-account-token-9pcmv:
Type: Secret (a volume populated by a Secret)
SecretName: istio-galley-service-account-token-9pcmv
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1m default-scheduler Successfully assigned istio-galley-5bf4d6b8f7-8t8qz to minikube
Normal SuccessfulMountVolume 1m kubelet, minikube MountVolume.SetUp succeeded for volume "config"
Normal SuccessfulMountVolume 1m kubelet, minikube MountVolume.SetUp succeeded for volume "istio-galley-service-account-token-9pcmv"
Warning FailedMount 27s (x7 over 1m) kubelet, minikube MountVolume.SetUp failed for volume "certs" : secrets "istio.istio-galley-service-account" not found
after some time :-
$ kubectl describe pod istio-galley-6d74549bb9-mdc97
Unable to connect to the server: net/http: TLS handshake timeout
so I wait for istio-sidecar-injector and istio-galley containers to get created. If I again run kubectl get pods -n istio-system or any other kubectl commands gives Unable to connect to the server: net/http: TLS handshake timeout error.
Please help me with this issue.
ps: I'm running minikube on ubuntu 16.04
Thanks in advance.
Looks like you are running into this and this the secret istio.istio-galley-service-account is missing in your istio-system namespace. You can try the workaround as described:
Install as outlined in the docs: https://istio.io/docs/setup/kubernetes/minimal-install/ the missing secret is created by the citadel pod which isn't running due to the --set security.enabled=false flag, setting that to true starts citadel and the secret is created.
Problem resolved. when I run minikube start --memory=4048. maybe it was a memory issue.
When using either the istio-demo.yaml or istio-demo-auth.yaml, you'll find that a minimum of 4GB RAM is required to run Istio (particularly when you deploy its sample app, BookInfo, too). This is true whether your running MiniKube or Docker Desktop and is one of the gotchas that Meshery identifies and attempts to help those deploying Istio or other service meshes circumvent.
I am trying alluxio 1.7.1 with docker 1.13.1, kubernetes 1.9.6, 1.10.1
I created the alluxio docker image as per the instructions on https://www.alluxio.org/docs/1.7/en/Running-Alluxio-On-Docker.html
Then I followed the https://www.alluxio.org/docs/1.7/en/Running-Alluxio-On-Kubernetes.html guide to run alluxio on kubernetes. I was able to bring up the alluxio master pod properly, but when I try to bring up alluxio worker I get the error that Address in use. I have not modified anything in the yamls which I downloaded from alluxio git. Only change I did was for alluxio docker image name and api version in yamls for k8s to match properly.
I checked ports being used in my k8s cluster setup, and even on the nodes also. There are no ports that alluxio wants being used by any other process, but I still get address in use error. I am unable to understand what I can do to debug further or what I should change to make this work. I don't have any other application running on my k8s cluster setup. I tried with single node k8s cluster setup and multi node k8s cluster setup also. I tried k8s version 1.9 and 1.10 also.
There is definitely some issue from alluxio worker side which I am unable to debug.
This is the log that I get from worker pod:
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# kubectl logs po/alluxio-worker-knqt4
Formatting Alluxio Worker # vm-sushil-scrum1-08062018-alluxio-1
2018-06-08 10:09:55,723 INFO Configuration - Configuration file /opt/alluxio/conf/alluxio-site.properties loaded.
2018-06-08 10:09:55,845 INFO Format - Formatting worker data folder: /alluxioworker/
2018-06-08 10:09:55,845 INFO Format - Formatting Data path for tier 0:/dev/shm/alluxioworker
2018-06-08 10:09:55,856 INFO Format - Formatting complete
2018-06-08 10:09:56,357 INFO Configuration - Configuration file /opt/alluxio/conf/alluxio-site.properties loaded.
2018-06-08 10:09:56,549 INFO TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=10.194.11.7, rack=null)
2018-06-08 10:09:56,866 INFO BlockWorkerFactory - Creating alluxio.worker.block.BlockWorker
2018-06-08 10:09:56,866 INFO FileSystemWorkerFactory - Creating alluxio.worker.file.FileSystemWorker
2018-06-08 10:09:56,942 WARN StorageTier - Failed to verify memory capacity
2018-06-08 10:09:57,082 INFO log - Logging initialized #1160ms
2018-06-08 10:09:57,509 INFO AlluxioWorkerProcess - Domain socket data server is enabled at /opt/domain.
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address in use
at alluxio.worker.AlluxioWorkerProcess.<init>(AlluxioWorkerProcess.java:164)
at alluxio.worker.WorkerProcess$Factory.create(WorkerProcess.java:45)
at alluxio.worker.WorkerProcess$Factory.create(WorkerProcess.java:37)
at alluxio.worker.AlluxioWorker.main(AlluxioWorker.java:56)
Caused by: java.lang.RuntimeException: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address in use
at alluxio.util.CommonUtils.createNewClassInstance(CommonUtils.java:224)
at alluxio.worker.DataServer$Factory.create(DataServer.java:45)
at alluxio.worker.AlluxioWorkerProcess.<init>(AlluxioWorkerProcess.java:159)
... 3 more
Caused by: io.netty.channel.unix.Errors$NativeIoException: bind(..) failed: Address in use
at io.netty.channel.unix.Errors.newIOException(Errors.java:117)
at io.netty.channel.unix.Socket.bind(Socket.java:259)
at io.netty.channel.epoll.EpollServerDomainSocketChannel.doBind(EpollServerDomainSocketChannel.java:75)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:504)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1226)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:495)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:480)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:213)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:305)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at java.lang.Thread.run(Thread.java:748)
-----------------------
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# kubectl get all
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ds/alluxio-worker 1 1 0 1 0 <none> 42m
ds/alluxio-worker 1 1 0 1 0 <none> 42m
NAME DESIRED CURRENT AGE
statefulsets/alluxio-master 1 1 44m
NAME READY STATUS RESTARTS AGE
po/alluxio-master-0 1/1 Running 0 44m
po/alluxio-worker-knqt4 0/1 Error 12 42m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/alluxio-master ClusterIP None <none> 19998/TCP,19999/TCP 44m
svc/kubernetes ClusterIP 10.254.0.1 <none> 443/TCP 1h
---------------------
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# kubectl describe po/alluxio-worker-knqt4
Name: alluxio-worker-knqt4
Namespace: default
Node: vm-sushil-scrum1-08062018-alluxio-1/10.194.11.7
Start Time: Fri, 08 Jun 2018 10:09:05 +0000
Labels: app=alluxio
controller-revision-hash=3081903053
name=alluxio-worker
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 10.194.11.7
Controlled By: DaemonSet/alluxio-worker
Containers:
alluxio-worker:
Container ID: docker://40a1eff2cd4dff79d9189d7cb0c4826a6b6e4871fbac65221e7cdd341240e358
Image: alluxio:1.7.1
Image ID: docker://sha256:b080715bd53efc783ee5f54e7f1c451556f93e7608e60e05b4615d32702801af
Ports: 29998/TCP, 29999/TCP, 29996/TCP
Command:
/entrypoint.sh
Args:
worker
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 08 Jun 2018 11:01:37 +0000
Finished: Fri, 08 Jun 2018 11:02:02 +0000
Ready: False
Restart Count: 14
Limits:
cpu: 1
memory: 2G
Requests:
cpu: 500m
memory: 2G
Environment Variables from:
alluxio-config ConfigMap Optional: false
Environment:
ALLUXIO_WORKER_HOSTNAME: (v1:status.hostIP)
Mounts:
/dev/shm from alluxio-ramdisk (rw)
/opt/domain from alluxio-domain (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-7xlz7 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
alluxio-ramdisk:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
alluxio-domain:
Type: HostPath (bare host directory volume)
Path: /tmp/domain
HostPathType: Directory
default-token-7xlz7:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-7xlz7
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulMountVolume 56m kubelet, vm-sushil-scrum1-08062018-alluxio-1 MountVolume.SetUp succeeded for volume "alluxio-domain"
Normal SuccessfulMountVolume 56m kubelet, vm-sushil-scrum1-08062018-alluxio-1 MountVolume.SetUp succeeded for volume "alluxio-ramdisk"
Normal SuccessfulMountVolume 56m kubelet, vm-sushil-scrum1-08062018-alluxio-1 MountVolume.SetUp succeeded for volume "default-token-7xlz7"
Normal Pulled 53m (x5 over 56m) kubelet, vm-sushil-scrum1-08062018-alluxio-1 Container image "alluxio:1.7.1" already present on machine
Normal Created 53m (x5 over 56m) kubelet, vm-sushil-scrum1-08062018-alluxio-1 Created container
Normal Started 53m (x5 over 56m) kubelet, vm-sushil-scrum1-08062018-alluxio-1 Started container
Warning BackOff 1m (x222 over 55m) kubelet, vm-sushil-scrum1-08062018-alluxio-1 Back-off restarting failed container
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# lsof -n -i :19999 | grep LISTEN
java 8949 root 29u IPv4 12518521 0t0 TCP *:dnp-sec (LISTEN)
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# lsof -n -i :19998 | grep LISTEN
java 8949 root 19u IPv4 12520458 0t0 TCP *:iec-104-sec (LISTEN)
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# lsof -n -i :29998 | grep LISTEN
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# lsof -n -i :29999 | grep LISTEN
[root#vm-sushil-scrum1-08062018-alluxio-1 kubernetes]# lsof -n -i :29996 | grep LISTEN
The alluxio-worker container is always restarting and failing again and again for the same error.
Please guide me what I can do to solve this.
Thanks
The problem was short circuit unix domain socket path. I was using whatever was present by default in alluxio git. In the default integration/kubernetes/conf/alluxio.properties.template the address for ALLUXIO_WORKER_DATA_SERVER_DOMAIN_SOCKET_ADDRESS was not complete. This is properly explained in https://www.alluxio.org/docs/1.7/en/Running-Alluxio-On-Docker.html for enabling short circuit reads in alluxio worker containers using unix domain sockets.
Just because of a missing complete path for unix domain socket the alluxio worker was not able to come up in kubernetes when short circuit read was enabled for alluxio worker.
When I corrected the path in integration/kubernetes/conf/alluxio.properties for ALLUXIO_WORKER_DATA_SERVER_DOMAIN_SOCKET_ADDRESS=/opt/domain/d
Then things started wokring properly. Now also some tests are failing but alteast the alluxio setup is properly up. Now I will debug why some tests are failing.
I have submitted this fix in alluxio git for them to merge it in master branch.
https://github.com/Alluxio/alluxio/pull/7376
On the node where your worker is running, it seems that you have a port already in use.
Try to find which process is using it:
sudo lsof -n -i :80 | grep LISTEN
I read the alluxio configuration files: try with ports 19998, 19999, 29996, 29998, 29999 substituting 80 in the above command.
I have a raspberry pi cluster (one master , 3 nodes)
My basic image is : raspbian stretch lite
I already set up a basic kubernetes setup where a master can see all his nodes (kubectl get nodes) and they're all running.
I used a weave network plugin for the network communication
When everything is all setup i tried to run a nginx pod (first with some replica's but now just 1 pod) on my cluster as followed
kubectl run my-nginx --image=nginx
But somehow the pod get stuck in the status "Container creating" , when i run docker images i can't see the nginx image being pulled. And normally an nginx image is not that large so it had to be pulled already by now (15 minutes).
The kubectl describe pods give the error that the pod sandbox failed to create and kubernetes will rec-create it.
I searched everything about this issue and tried the solutions on stackoverflow (reboot to restart cluster, searched describe pods , new network plugin tried it with flannel) but i can't see what the actual problem is.
I did the exact same thing in Virtual box (just ubuntu not ARM ) and everything worked.
First i thougt it was a permission issue because i run everything as a normal user , but in vm i did the same thing and nothing changed.
Then i checked kubectl get pods --all-namespaces to verify that the pods for the weaver network and kube-dns are running and also nothing wrong over there .
Is this a firewall issue in Raspberry pi ?
Is the weave network plugin not compatible (even the kubernetes website says it is) with arm devices ?
I 'am guessing there is an api network problem and thats why i can't get my pod runnning on a node
[EDIT]
Log files
kubectl describe podName
>
> Name: my-nginx-9d5677d94-g44l6 Namespace: default Node: kubenode1/10.1.88.22 Start Time: Tue, 06 Mar 2018 08:24:13
> +0000 Labels: pod-template-hash=581233850
> run=my-nginx Annotations: <none> Status: Pending IP: Controlled By: ReplicaSet/my-nginx-9d5677d94 Containers:
> my-nginx:
> Container ID:
> Image: nginx
> Image ID:
> Port: 80/TCP
> State: Waiting
> Reason: ContainerCreating
> Ready: False
> Restart Count: 0
> Environment: <none>
> Mounts:
> /var/run/secrets/kubernetes.io/serviceaccount from default-token-phdv5 (ro) Conditions: Type Status
> Initialized True Ready False PodScheduled True
> Volumes: default-token-phdv5:
> Type: Secret (a volume populated by a Secret)
> SecretName: default-token-phdv5
> Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for
> 300s
> node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From
> Message ---- ------ ---- ----
> ------- Normal Scheduled 5m default-scheduler Successfully assigned my-nginx-9d5677d94-g44l6 to kubenode1 Normal
> SuccessfulMountVolume 5m kubelet, kubenode1 MountVolume.SetUp
> succeeded for volume "default-token-phdv5" Warning
> FailedCreatePodSandBox 1m kubelet, kubenode1 Failed create pod
> sandbox. Normal SandboxChanged 1m kubelet, kubenode1
> Pod sandbox changed, it will be killed and re-created.
kubectl logs podName
Error from server (BadRequest): container "my-nginx" in pod "my-nginx-9d5677d94-g44l6" is waiting to start: ContainerCreating
journalctl -u kubelet gives this error
Mar 12 13:42:45 kubeMaster kubelet[16379]: W0312 13:42:45.824314 16379 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Mar 12 13:42:45 kubeMaster kubelet[16379]: E0312 13:42:45.824816 16379 kubelet.go:2104] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
The problem seems to be with my network plugin. In my /etc/systemd/system/kubelet.service.d/10.kubeadm.conf . the flags for the network plugins are present ? environment= kubelet_network_args --cni-bin-dir=/etc/cni/net.d
--network-plugin=cni
Thank you all for responding to my question.
I solved my problem now. For anyone who has come to my question in the future the solution was as followed.
I cloned my raspberry pi images because i wanted a basicConfig.img for when i needed to add a new node to my cluster of when one gets down.
Weave network (the plugin i used) got confused because on every node and master the os had the same machine-id. When i deleted the machine id and created a new one (and reboot the nodes) my error got fixed.
The commands to do this was
sudo rm /etc/machine-id
sudo rm /var/lib/dbus/machine-id
sudo dbus-uuidgen --ensure=/etc/machine-id
Once again my patience was being tested. Because my kubernetes setup was normal and my raspberry pi os was normal. I founded this with the help of someone in the kubernetes community. This again shows us how important and great our IT community is. To the people of the future who will come to this question. I hope this solution will fix your error and will decrease the amount of time you will be searching after a stupid small thing.
You can see if it's network related by finding the node trying to pull the image:
kubectl describe pod <name> -n <namespace>
SSH to the node, and run docker pull nginx on it. If it's having issues pulling the image manually, then it might be network related.