My deployment is using a couple of volumes, all defined as ReadWriteOnce.
When applying the deployment to a clean cluster, pod is created successfuly.
However, if I update my deployment (i.e update container image), when a new pod is created for my deployment it will always fail on volume mount:
/Mugen$ kubectl get pods
NAME READY STATUS RESTARTS AGE
my-app-556c8d646b-4s2kg 5/5 Running 1 2d
my-app-6dbbd99cc4-h442r 0/5 ContainerCreating 0 39m
/Mugen$ kubectl describe pod my-app-6dbbd99cc4-h442r
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m default-scheduler Successfully assigned my-app-6dbbd99cc4-h442r to gke-my-test-default-pool-671c9db5-k71l
Warning FailedAttachVolume 9m attachdetach-controller Multi-Attach error for volume "pvc-b57e8a7f-1ca9-11e9-ae03-42010a8400a8" Volume is already used by pod(s) my-app-556c8d646b-4s2kg
Normal SuccessfulMountVolume 9m kubelet, gke-my-test-default-pool-671c9db5-k71l MountVolume.SetUp succeeded for volume "default-token-ksrbf"
Normal SuccessfulAttachVolume 9m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-2cc1955a-1cb2-11e9-ae03-42010a8400a8"
Normal SuccessfulAttachVolume 9m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-2c8dae3e-1cb2-11e9-ae03-42010a8400a8"
Normal SuccessfulMountVolume 9m kubelet, gke-my-test-default-pool-671c9db5-k71l MountVolume.SetUp succeeded for volume "pvc-2cc1955a-1cb2-11e9-ae03-42010a8400a8"
Normal SuccessfulMountVolume 9m kubelet, gke-my-test-default-pool-671c9db5-k71l MountVolume.SetUp succeeded for volume "pvc-2c8dae3e-1cb2-11e9-ae03-42010a8400a8"
Warning FailedMount 52s (x4 over 7m) kubelet, gke-my-test-default-pool-671c9db5-k71l Unable to mount volumes for pod "my-app-6dbbd99cc4-h442r_default(affe75e0-1edd-11e9-bb45-42010a840094)": timeout expired waiting for volumes to attach or mount for pod "default"/"my-app-6dbbd99cc4-h442r". list of unmounted volumes=[...]. list of unattached volumes=[...]
What is the best strategy to apply changes to such a deployment then? Will there have to be some service outage in order to use the same persistence volumes? (I wouldn't want to create new volumes - the data should maintain)
You will need to tolerate an outage here, due to the access mode. This will delete the existing Pods (unmounting the volumes) before creating new ones.
A Deployment strategy - .spec.strategy.type - of “Recreate” will help achieve this: https://github.com/ContainerSolutions/k8s-deployment-strategies/blob/master/recreate/README.md
That seems like an error because of the ReadWriteOnce access mode. Remember that when you update a deployment, new pods gets created and then the older gets killed. So, maybe the new pod try to mount an already mounted volume and that’s why you get that message.
Have you tried using a volume that allows multiple readers/writers? You can see the list of current volumes in the Kubernetes Volumes documentation.
I ended with a better solution, where all my client pods are only readers of the content, and I have an independent CI process writing the content, I do the following:
From CI: Write content to a Google Cloud Storage bucket: gs://my-storage, then restart all frontend pods
On deployment definition I sync (download) entire bucket to pod volatile storage and serve it from file system with the best performance.
How to achieve that:
On the frontend docker image, I added the gcloud installation block from https://github.com/GoogleCloudPlatform/cloud-sdk-docker/blob/master/debian_slim/Dockerfile :
ARG CLOUD_SDK_VERSION=249.0.0
ENV CLOUD_SDK_VERSION=$CLOUD_SDK_VERSION
ARG INSTALL_COMPONENTS
ENV PATH "$PATH:/opt/google-cloud-sdk/bin/"
RUN apt-get update -qqy && apt-get install -qqy \
curl \
gcc \
python-dev \
python-setuptools \
apt-transport-https \
lsb-release \
openssh-client \
git \
gnupg \
&& easy_install -U pip && \
pip install -U crcmod && \
export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" > /etc/apt/sources.list.d/google-cloud-sdk.list && \
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
apt-get update && apt-get install -y google-cloud-sdk=${CLOUD_SDK_VERSION}-0 $INSTALL_COMPONENTS && \
gcloud config set core/disable_usage_reporting true && \
gcloud config set component_manager/disable_update_check true && \
gcloud config set metrics/environment github_docker_image && \
gcloud --version
VOLUME ["/root/.config"]
And in the pod deployment frontend.yaml I added the following lifecycle event:
...
spec:
...
containers:
...
lifecycle:
postStart:
exec:
command: ["gsutil", "-m", "rsync", "-r", "gs://my-storage", "/usr/share/nginx/html"]
To "refresh" the frontend pods when bucket content is updated, I simply run the following from my CI:
kubectl set env deployment/frontend K8S_FORCE=date +%s``
Related
I am new to Kubernetes, and trying to get apache airflow working using helm charts. After almost a week of struggling, I am nowhere - even to get the one provided in the apache airflow documentation working. I use Pop OS 20.04 and microk8s.
When I run these commands:
kubectl create namespace airflow
helm repo add apache-airflow https://airflow.apache.org
helm install airflow apache-airflow/airflow --namespace airflow
The helm installation times out after five minutes.
kubectl get pods -n airflow
shows this list:
NAME READY STATUS RESTARTS AGE
airflow-postgresql-0 0/1 Pending 0 4m8s
airflow-redis-0 0/1 Pending 0 4m8s
airflow-worker-0 0/2 Pending 0 4m8s
airflow-scheduler-565d8587fd-vm8h7 0/2 Init:0/1 0 4m8s
airflow-triggerer-7f4477dcb6-nlhg8 0/1 Init:0/1 0 4m8s
airflow-webserver-684c5d94d9-qhhv2 0/1 Init:0/1 0 4m8s
airflow-run-airflow-migrations-rzm59 1/1 Running 0 4m8s
airflow-statsd-84f4f9898-sltw9 1/1 Running 0 4m8s
airflow-flower-7c87f95f46-qqqqx 0/1 Running 4 4m8s
Then when I run the below command:
kubectl describe pod airflow-postgresql-0 -n airflow
I get the below (trimmed up to the events):
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 58s (x2 over 58s) default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Then I deleted the namespace using the following commands
kubectl delete ns airflow
At this point, the termination of the pods gets stuck. Then I bring up the proxy in another terminal:
kubectl proxy
Then issue the following command to force deleting the namespace and all it's pods and resources:
kubectl get ns airflow -o json | jq '.spec.finalizers=[]' | curl -X PUT http://localhost:8001/api/v1/namespaces/airflow/finalize -H "Content-Type: application/json" --data #-
Then I deleted the PVC's using the following command:
kubectl delete pvc --force --grace-period=0 --all -n airflow
You get stuck again, so I had to issue another command to force this deletion:
kubectl patch pvc data-airflow-postgresql-0 -p '{"metadata":{"finalizers":null}}' -n airflow
The PVC's gets terminated at this point and these two commands return nothing:
kubectl get pvc -n airflow
kubectl get all -n airflow
Then I restarted the machine and executed the helm install again (using first and last commands in the first section of this question), but the same result.
I executed the following command then (using the suggestions I found here):
kubectl describe pvc -n airflow
I got the following output (I am posting the event portion of PostgreSQL):
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 2m58s (x42 over 13m) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
So my assumption is that I need to provide storage class as part of the values.yaml
Is my understanding right? How do I provide the required (and what values) in the values.yaml?
If you installed with helm, you can uninstall with helm delete airflow -n airflow.
Here's a way to install airflow for testing purposes using default values:
Generate the manifest helm template airflow apache-airflow/airflow -n airflow > airflow.yaml
Open the "airflow.yaml" with your favorite editor, replace all "volumeClaimTemplates" with emptyDir. Example:
Create the namespace and install:
kubectl create namespace airflow
kubectl apply -f airflow.yaml --namespace airflow
You can copy files out from the pods if needed.
To delete kubectl delete -f airflow.yaml --namespace airflow.
I have tried setting max nodes per pod using the following upon install:
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--max-pods 250" sh -s -
However, the K3s server will then fail to load. It appears that the --max-pods flag has been deprecated per the kubernetes docs:
--max-pods int32 Default: 110
(DEPRECATED: This parameter should be set via the config file
specified by the Kubelet's --config flag. See
https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/
for more information.)
So with K3s, where is that kubelet config file and can/should it be set using something like the above method?
To update your existing installation with an increased max-pods, add a kubelet config file into a k3s associated location such as /etc/rancher/k3s/kubelet.config:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
maxPods: 250
edit /etc/systemd/system/k3s.service to change the k3s server args:
ExecStart=/usr/local/bin/k3s \
server \
'--disable' \
'servicelb' \
'--disable' \
'traefik' \
'--kubelet-arg=config=/etc/rancher/k3s/kubelet.config'
reload systemctl to pick up the service change:
sudo systemctl daemon-reload
restart k3s:
sudo systemctl restart k3s
Check the output of describe nodes with kubectl describe <node> and look for allocatable resources:
Allocatable:
cpu: 32
ephemeral-storage: 199789251223
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 131811756Ki
pods: 250
and a message noting that allocatable node limit has been updated in Events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 20m kube-proxy Starting kube-proxy.
Normal Starting 20m kubelet Starting kubelet.
...
Normal NodeNotReady 7m52s kubelet Node <node> status is now: NodeNotReady
Normal NodeAllocatableEnforced 7m50s kubelet Updated Node Allocatable limit across pods
Normal NodeReady 7m50s kubelet Node <node> status is now: NodeReady
As described in documentation, it is possible to set the Kubelet's configuration parameters via
an on-disk config file.
NOTE: Using an on-disk config file, we can set only subset of the Kubelet's configuration parameters, that we want to override, all other Kubelet configuration values are left at their built-in defaults, unless overridden by flags.
I've created simple config file to override maxPods value (default 110):
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
maxPods: 250
And then we have to pass this config file as an argument during K3s installation (I recommend to specify absolute pathname):
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--kubelet-arg=config=<KUBELET_CONFIG_FILE_LOCATION>" sh -
Finally we can check that maxPods is equal to 250:
# kubectl describe nodes <NODE_NAME> | grep -i pod
pods: 250
pods: 250
PodCIDR: 10.42.0.0/24
PodCIDRs: 10.42.0.0/24
In addition you can find interesting discussion here, I believe you can find there another way to solve your issue.
I've created my loca registry:
$ docker container run -d
--name registry.localhost
--restart always
-p 5000:5000
registry:2
It's up and running:
$ curl -s registry.localhost:5000/v2/_catalog | jq
{
"repositories": [
"greenplum-for-kubernetes",
"greenplum-operator"
]
}
I'm trying to create a deployment. However I'm getting:
4m7s Normal ScalingReplicaSet deployment/greenplum-operator Scaled up replica set greenplum-operator-76b544fbb9 to 1
4m7s Normal SuccessfulCreate replicaset/greenplum-operator-76b544fbb9 Created pod: greenplum-operator-76b544fbb9-pm7t2
<unknown> Normal Scheduled pod/greenplum-operator-76b544fbb9-pm7t2 Successfully assigned default/greenplum-operator-76b544fbb9-pm7t2 to k3d-k3s-default-agent-0
3m23s Normal Pulling pod/greenplum-operator-76b544fbb9-pm7t2 Pulling image "registry.localhost:5000/greenplum-operator:v2.2.0"
3m23s Warning Failed pod/greenplum-operator-76b544fbb9-pm7t2 Error: ErrImagePull
3m23s Warning Failed pod/greenplum-operator-76b544fbb9-pm7t2 Failed to pull image "registry.localhost:5000/greenplum-operator:v2.2.0": rpc error: code = Unknown desc = failed to pull and unpack image "registry.localhost:5000/greenplum-operator:v2.2.0": failed to resolve reference "registry.localhost:5000/greenplum-operator:v2.2.0": failed to do request: Head https://registry.localhost:5000/v2/greenplum-operator/manifests/v2.2.0: http: server gave HTTP response to HTTPS client
3m1s Warning Failed pod/greenplum-operator-76b544fbb9-pm7t2 Error: ImagePullBackOff
3m1s Normal BackOff pod/greenplum-operator-76b544fbb9-pm7t2 Back-off pulling image "registry.localhost:5000/greenplum-operator:v2.2.0"
In short:
http: server gave HTTP response to HTTPS client
My cluster is up and running as well:
$ k3d cluster create --agents 2 --k3s-server-arg --disable=traefik
--volume $HOME/.k3d/registries.yaml:/etc/rancher/k3s/my-registries.yaml
As you can see:
$ cat ${HOME}/.k3d/registries.yaml
mirrors:
"registry.localhost:5000":
endpoint:
- "http://registry.localhost:5000"
Any ideas?
Do this your workers
Create or modify /etc/docker/daemon.json
{ "insecure-registries":["registry.localhost:5000"] }
Restart docker daemon
sudo service docker restart
Source: https://github.com/docker/distribution/issues/1874
The problem was related with k3s and I mage a miswriting.
k3s needs to get access to /etc/rancher/k3s/registries.yaml file, as you can see here.
The problem is push a my-registries.yaml file instead of registries.yaml:
$ k3d cluster create --agents 2 --k3s-server-arg --disable=traefik
--volume $HOME/.k3d/registries.yaml:/etc/rancher/k3s/my-registries.yaml
The problem was solved:
$ k3d cluster create --agents 2 --k3s-server-arg --disable=traefik
--volume $HOME/.k3d/registries.yaml:/etc/rancher/k3s/registries.yaml
I have installed two nodes kubernetes 1.12.1 in cloud VMs, both behind internet proxy. Each VMs have floating IPs associated to connect over SSH, kube-01 is a master and kube-02 is a node. Executed export:
no_proxy=127.0.0.1,localhost,10.157.255.185,192.168.0.153,kube-02,192.168.0.25,kube-01
before running kubeadm init, but I am getting the following status for kubectl get nodes:
NAME STATUS ROLES AGE VERSION
kube-01 NotReady master 89m v1.12.1
kube-02 NotReady <none> 29s v1.12.2
Am I missing any configuration? Do I need to add 192.168.0.153 and 192.168.0.25 in respective VM's /etc/hosts?
Looks like pod network is not installed yet on your cluster . You can install weave for example with below command
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
After a few seconds, a Weave Net pod should be running on each Node and any further pods you create will be automatically attached to the Weave network.
You can install pod networks of your choice . Here is a list
after this check
$ kubectl describe nodes
check all is fine like below
Conditions:
Type Status
---- ------
OutOfDisk False
MemoryPressure False
DiskPressure False
Ready True
Capacity:
cpu: 2
memory: 2052588Ki
pods: 110
Allocatable:
cpu: 2
memory: 1950188Ki
pods: 110
next ssh to the pod which is not ready and observe kubelet logs. Most likely errors can be of certificates and authentication.
You can also use journalctl on systemd to check kubelet errors.
$ journalctl -u kubelet
Try with this
Your coredns is in pending state check with the networking plugin you have used and check the proper addons are added
check kubernates troubleshooting guide
https://kubernetes.io/docs/setup/independent/troubleshooting-kubeadm/#coredns-or-kube-dns-is-stuck-in-the-pending-state
https://kubernetes.io/docs/concepts/cluster-administration/addons/
And install the following with those
And check
kubectl get pods -n kube-system
On the off chance it might be the same for someone else, in my case, I was using the wrong AMI image to create the nodegroup.
Run
journalctl -u kubelet
Then check at node logs, if you get below error, disable the sawp using swapoff -a
"Failed to run kubelet" err="failed to run Kubelet: running with swap on is not supported, please disable swap! or set --fa
Main process exited, code=exited, status=1/FAILURE
I am using minikube on Linux to get started with kubernetes. Going with the examples in the readme and going with the none vm-diver, I do the following.
$ minikube start --vm-driver=none
Starting local Kubernetes v1.9.0 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
Kubectl is now configured to use the cluster.
===================
WARNING: IT IS RECOMMENDED NOT TO RUN THE NONE DRIVER ON PERSONAL WORKSTATIONS
The 'none' driver will run an insecure kubernetes apiserver as root that may leave the host vulnerable to CSRF attacks
When using the none driver, the kubectl config and credentials generated will be root owned and will appear in the root home directory.
You will need to move the files to the appropriate location and then set the correct permissions. An example of this is below:
sudo mv /root/.kube $HOME/.kube # this will write over any previous configuration
sudo chown -R $USER $HOME/.kube
sudo chgrp -R $USER $HOME/.kube
sudo mv /root/.minikube $HOME/.minikube # this will write over any previous configuration
sudo chown -R $USER $HOME/.minikube
sudo chgrp -R $USER $HOME/.minikube
This can also be done automatically by setting the env var CHANGE_MINIKUBE_NONE_USER=true
Loading cached images from config file.
$ kubectl get nodes
No resources found.
$ kubectl run hello-minikube --image=k8s.gcr.io/echoserver:1.4 --port=8080
deployment "hello-minikube" created
$ kubectl expose deployment hello-minikube --type=NodePort
service "hello-minikube" exposed
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
hello-minikube-c6c6764d-h64t8 0/1 Pending 0 3m
Now, the problem is that this pod continues to remain pending. It looks like there are no nodes to run it on but I do not know why. Where am I going wrong?
EDIT: Here is the output of describeing the pod.
$ kubectl describe pod hello-minikube-c6c6764d-h64t8
Name: hello-minikube-c6c6764d-h64t8
Namespace: default
Node: <none>
Labels: pod-template-hash=72723208
run=hello-minikube
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/hello-minikube-c6c6764d
Containers:
hello-minikube:
Image: k8s.gcr.io/echoserver:1.4
Port: 8080/TCP
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-dw4j7 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-dw4j7:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-dw4j7
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 20h (x4 over 20h) default-scheduler no nodes available to schedule pods
Warning FailedScheduling 20h (x4 over 20h) default-scheduler no nodes available to schedule pods
Warning FailedScheduling 20h (x5 over 20h) default-scheduler no nodes available to schedule pods
Warning FailedScheduling 19h (x5 over 19h) default-scheduler no nodes available to schedule pods
Warning FailedScheduling 19h (x5 over 19h) default-scheduler no nodes available to schedule pods
Warning FailedScheduling 18h (x5 over 18h) default-scheduler no nodes available to schedule pods
Warning FailedScheduling 1s (x5 over 16s) default-scheduler no nodes available to schedule pods
try running minikube status if you get this then try running this command.
fix worked for me:
sudo chown -R $USER /home/docker/.minikube/machines/minikube/config.json
Error with minikube status:
"X Error getting host status: load: filestore: open /home/docker/.minikube/machines/minikube/config.json: permission denied
*
* Sorry that minikube crashed. If this was unexpected, we would love to hear from you:
- https://github.com/kubernetes/minikube/issues/new/choose"