I am trying to change the sync period as mentioned in the following k8s document. I found the file named kube-controller-manager.yaml in /etc/kubernetes/manifests. I changed the timeoutSeconds: value from 15 secs(default) to 60 secs. Now I have 2 questions based on above info:
Is this the right way to change the sync period ? Cause I have read in the document that there is some flag named --horizontal-pod-autoscaler-sync-period, but adding that is also not working.
And any changes made to the file kube-controller-manager.yaml are getting restored to default whenever I restart the minikube? what should I do? Please let me know any solution or any view in this.
To change the sync period you have to run following command while starting the minikube.
minikube start --extra-config 'controller-manager.horizontal-pod-autoscaler-sync-period=10s'
Related
We tried to harden the gke optimized image (gke-1.15.11) for our cluster. We took an ssh into the node instance and made the cis porposed changes in the /home/kubernetes/kubelet-config.yaml file and ran kubebench to check if all the conditions have passed around 8 condtions failed these where the exact conditions we changed in the file. But, then we made the exact argument changes in /etc/default/kubernetes and ran kubebench again the conditions passed. But, when we restarted the instance we all the changes we made in the /ect/default/kubernetes file where gone. Can someone let me know where we are going wrong or is there any other path where we have to make the cis benchmark suggested entries
GKE doesn't support user-provided node images as of April 2020. Recommended option is to create your own DaemonSet with host filesystem writes and/or host services restart to propagate all the required changes.
I have 2 Slave and 1 Master node kubernetes cluster.When a node down it takes approximately 5 minutes to kubernetes see that failure.I am using dynamic provisioning for volumes and this time is a little bit much for me.How can i reduce that detecting failure time ?
I found a post about it:
https://fatalfailure.wordpress.com/2016/06/10/improving-kubernetes-reliability-quicker-detection-of-a-node-down/
At the bottom of the post,it says, we can reduce that detection time by changing that parameters:
kubelet: node-status-update-frequency=4s (from 10s)
controller-manager: node-monitor-period=2s (from 5s)
controller-manager: node-monitor-grace-period=16s (from 40s)
controller-manager: pod-eviction-timeout=30s (from 5m)
i can change node-status-update-frequency parameter from kubelet but i don't have any controller manager program or command on the cli.How can i change that parameters? Any other suggestions about reducing detect downtime will be appreciated.
..but i don't have any controller manager program or command on the
cli.How can i change that parameters?
You can change/add that parameter in controller-manger systemd unit file and restart the daemon. Please check the man pages for controller-manager here.
If you deploy controller-manager as micro service(pod), check the manifest file for that pod and change the parameters at container's command section(For example like this)
It's actually kube-controller-manager. You may also decrease --attach-detach-reconcile-sync-period from 1m to 15 or 30 seconds for kube-controller-manager. This will allow for more speedy volumes attach-detach actions. How you change those parameters depends on how you set up the cluster.
I currently do have a problem with the statefulset under the following condition:
I have a percona SQL cluster running with persistent storage and 2 nodes
now i do force both pods to fail.
first i will force pod-0 to fail
Afterwards i will force pod-1 to fail
Now the cluster is not able to recover without manual interference and possible dataloss
Why:
The statefulset is trying to bring pod-0 up first, however this one will not be brought online because of the following message:
[ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1
What i could do alternatively, but what i dont really like:
I could change ".spec.podManagementPolicy" to "Parallel" but this could lead to race conditions when forming the cluster. Thus i would like to avoid that, i basically like the idea of starting the nodes one after another
What i would like to have:
the possibility to have ".spec.podManagementPolicy":"OrderedReady" activated but with the possibility to adjust the order somehow
to be able to put specific pods into "inactive" mode so they are being ignored until i enable them again
Is something like that available? Does someone have any other ideas?
Unfortunately, nothing like that is available in standard functions of Kubernetes.
I see only 2 options here:
Use InitContainers to somehow check the current state on relaunch.
That will allow you to run any code before the primary container is started so you can try to use a custom script in order to resolve the problem etc.
Modify the database startup script to allow it to wait for some Environment Variable or any flag file and use PostStart hook to check the state before running a database.
But in both options, you have to write your own logic of startup order.
I am creating an app in Origin 3.1 using my Docker image.
Whenever I create image new pod gets created but it restarts again and again and finally gives status as "CrashLoopBackOff".
I analysed logs for pod but it gives no error, all log data is as expected for a successfully running app. Hence, not able to determine the cause.
I came across below link today, which says "running an application inside of a container as root still has risks, OpenShift doesn't allow you to do that by default and will instead run as an arbitrary assigned user ID."
What is CrashLoopBackOff status for openshift pods?
Here my image is using root user only, what to do to make this work? as logs shows no error but pod keeps restarting.
Could anyone please help me with this.
You are seeing this because whatever process your image is starting isn't a long running process and finds no TTY and the container just exits and gets restarted repeatedly, which is a "crash loop" as far as openshift is concerned.
Your dockerfile mentions below :
ENTRYPOINT ["container-entrypoint"]
What actually this "container-entrypoint" doing ?
you need to check.
Did you use the -p or --previous flag to oc logs to see if the logs from the previous attempt to start the pod show anything
The recommendation of Red Hat is to make files group owned by GID 0 - the user in the container is always in the root group. You won't be able to chown, but you can selectively expose which files to write to.
A second option:
In order to allow images that use either named users or the root (0) user to build in OpenShift, you can add the project’s builder service account (system:serviceaccount::builder) to the privileged security context constraint (SCC). Alternatively, you can allow all images to run as any user.
Can you see the logs using
kubectl logs <podname> -p
This should give you the errors why the pod failed.
I am able to resolve this by creating a script as "run.sh" with the content at end:
while :; do
sleep 300
done
and in Dockerfile:
ADD run.sh /run.sh
RUN chmod +x /*.sh
CMD ["/run.sh"]
This way it works, thanks everybody for pointing out the reason ,which helped me in finding the resolution. But one doubt I still have why process gets exited in openshift in this case only, I have tried running tomcat server in the same way which just works fine without having sleep in script.
I have a replication controller containing a pod (with 1 replica) that takes ~10m to start. As my application grows over time, that duration is going to increase.
My problem is when I deploy a new version, the prior one is killed, then the new one can start.
Is it possible to make kubernetes not kill the old pod during a rolling update, until the new pod is running ?
It's okay for me to have multiple replicas if it is necessary, but that did not fix the issue.
The replication controller has livenessProbe and readinessProbe set correctly.
I kept searching, and it's not possible right now ( 13 Oct 2015 ), but I made an issue you can follow : https://github.com/kubernetes/kubernetes/issues/15557.