Removing DeploymentConfig when Pods are of a certain age using Ansible - kubernetes

I'm using Ansible to rollout a DeploymentConfig and all its sub components like Service, Route and PersistentVolumeClaim on a Kubernetes Cluster on Openshift.
Now I'd like to remove all these components from the cluster if Pods are of a certain age.
How would I do this with Ansible? I know there's a module called k8s_info, but how do I query the Pods and use the result to set the status of all items to absent.
Setup of my Play so far:
---
- name: Remove old deployments
module_defaults:
k8s:
validate_certs: no
hosts: localhost
tasks:
- name: Get a list of all service objects
k8s_info:
api_version: v1
kind: Pod
namespace: "{{ PROJECT }}"
field_selectors:
- status.phase=Running
# filter on pods older than 5 days
register: podlist
# use podlist to get metadata names for which to set state to absent

There must be a cleaner/easier approach but here is a workable alternative :
You can create a job along with your resources, which keeps track of the age of the pod and delete all the resources along with itself when time comes (condition met). The job should run with a service account that has access to do so.

Related

how to get all replicaset names inside a container

Consider the following example provided in this doc.
What I'm trying to achieve is to see the 3 replicas names from inside the container.
following this guide I was able to get the current pod name, but i need also the pod names from my replicas.
Ideally i would like to:
print(k8s.get_my_replicaset_names())
or
print(os.getenv("MY_REPLICASET"))
and have a result like:
[frontend-b2zdv,frontend-vcmts,frontend-wtsmm]
that is the pod names of all the container's replicas (also the current container of course) and eventually compare the current name in the name list to get my index in the list.
Is there any way to achieve this?
As you can read here, the Downward API is used to expose Pod and Container fields to a running Container:
There are two ways to expose Pod and Container fields to a running
Container:
Environment variables
Volume Files
Together, these two ways of exposing Pod and Container fields are
called the Downward API.
It is not meant to expose any information about other objects/resources such as ReplicaSet or Deployment, that manage such a Pod.
You can see exactly what fields contains the yaml manifest that describes a running Pod by executing:
kubectl get pods <pod_name> -o yaml
The example fragment of its output may look as follows:
apiVersion: v1
kind: Pod
metadata:
annotations:
<some annotations here>
...
creationTimestamp: "2020-10-08T22:18:03Z"
generateName: nginx-deployment-7bffc778db-
labels:
app: nginx
pod-template-hash: 7bffc778db
name: nginx-deployment-7bffc778db-8fzrz
namespace: default
ownerReferences: 👈
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet 👈
name: nginx-deployment-7bffc778db 👈
...
As you can see, in metadata section it contains ownerReferences which in the above example contains one reference to a ReplicaSet object by which this Pod is managed. So you can get this particular ReplicaSet name pretty easily as it is part of a Pod yaml manifest.
However, you cannot get this way information about other Pods managed by this ReplicaSet .
Such information only can be obtained from the api server e.g by using kubectl client or programmatically with direct calls to the API.

Helm 3 Deployment Order of Kubernetes Service Catalog Resources

I am using Helm v3.3.0, with a Kubernetes 1.16.
The cluster has the Kubernetes Service Catalog installed, so external services implementing the Open Service Broker API spec can be instantiated as K8S resources - as ServiceInstances and ServiceBindings.
ServiceBindings reflect as K8S Secrets and contain the binding information of the created external service. These secrets are usually mapped into the Docker containers as environment variables or volumes in a K8S Deployment.
Now I am using Helm to deploy my Kubernetes resources, and I read here that...
The [Helm] install order of Kubernetes types is given by the enumeration InstallOrder in kind_sorter.go
In that file, the order does neither mention ServiceInstance nor ServiceBinding as resources, and that would mean that Helm installs these resource types after it has installed any of its InstallOrder list - in particular Deployments. That seems to match the output of helm install --dry-run --debug run on my chart, where the order indicates that the K8S Service Catalog resources are applied last.
Question: What I cannot understand is, why my Deployment does not fail to install with Helm.
After all my Deployment resource seems to be deployed before the ServiceBinding is. And it is the Secret generated out of the ServiceBinding that my Deployment references. I would expect it to fail, since the Secret is not there yet, when the Deployment is getting installed. But that is not the case.
Is that just a timing glitch / lucky coincidence, or is this something I can rely on, and why?
Thanks!
As said in the comment I posted:
In fact your Deployment is failing at the start with Status: CreateContainerConfigError. Your Deployment is created before Secret from the ServiceBinding. It's only working as it was recreated when the Secret from ServiceBinding was available.
I wanted to give more insight with example of why the Deployment didn't fail.
What is happening (simplified in order):
Deployment -> created and spawned a Pod
Pod -> failing pod with status: CreateContainerConfigError by lack of Secret
ServiceBinding -> created Secret in a background
Pod gets the required Secret and starts
Previously mentioned InstallOrder will leave ServiceInstace and ServiceBinding as last by comment on line 147.
Example
Assuming that:
There is a working Kubernetes cluster
Helm3 installed and ready to use
Following guides:
Kubernetes.io: Instal Service Catalog using Helm
Magalix.com: Blog: Kubernetes Service Catalog
There is a Helm chart with following files in templates/ directory:
ServiceInstance
ServiceBinding
Deployment
Files:
ServiceInstance.yaml:
apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceInstance
metadata:
name: example-instance
spec:
clusterServiceClassExternalName: redis
clusterServicePlanExternalName: 5-0-4
ServiceBinding.yaml:
apiVersion: servicecatalog.k8s.io/v1beta1
kind: ServiceBinding
metadata:
name: example-binding
spec:
instanceRef:
name: example-instance
Deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ubuntu
spec:
selector:
matchLabels:
app: ubuntu
replicas: 1
template:
metadata:
labels:
app: ubuntu
spec:
containers:
- name: ubuntu
image: ubuntu
command:
- sleep
- "infinity"
# part below responsible for getting secret as env variable
env:
- name: DATA
valueFrom:
secretKeyRef:
name: example-binding
key: host
Applying above resources to check what is happening can be done in 2 ways:
First method is to use timestamp from $ kubectl get RESOURCE -o yaml
Second method is to use $ kubectl get RESOURCE --watch-only=true
First method
As said previously the Pod from the Deployment couldn't start as Secret was not available when the Pod tried to spawn. After the Secret was available to use, the Pod started.
The statuses this Pod had were the following:
Pending
ContainerCreating
CreateContainerConfigError
Running
This is a table with timestamps of Pod and Secret:
| Pod | Secret |
|-------------------------------------------|-------------------------------------------|
| creationTimestamp: "2020-08-23T19:54:47Z" | - |
| - | creationTimestamp: "2020-08-23T19:54:55Z" |
| startedAt: "2020-08-23T19:55:08Z" | - |
You can get this timestamp by invoking below commands:
$ kubectl get pod pod_name -n namespace -o yaml
$ kubectl get secret secret_name -n namespace -o yaml
You can also get get additional information with:
$ kubectl get event -n namespace
$ kubectl describe pod pod_name -n namespace
Second method
This method requires preparation before running Helm chart. Open another terminal window (for this particular case 2) and run:
$ kubectl get pod -n namespace --watch-only | while read line ; do echo -e "$(gdate +"%H:%M:%S:%N")\t $line" ; done
$ kubectl get secret -n namespace --watch-only | while read line ; do echo -e "$(gdate +"%H:%M:%S:%N")\t $line" ; done
After that apply your Helm chart.
Disclaimer!
Above commands will watch for changes in resources and display them with a timestamp from the OS. Please remember that this command is only for example purposes.
The output for Pod:
21:54:47:534823000 NAME READY STATUS RESTARTS AGE
21:54:47:542107000 ubuntu-65976bb789-l48wz 0/1 Pending 0 0s
21:54:47:553799000 ubuntu-65976bb789-l48wz 0/1 Pending 0 0s
21:54:47:655593000 ubuntu-65976bb789-l48wz 0/1 ContainerCreating 0 0s
-> 21:54:52:001347000 ubuntu-65976bb789-l48wz 0/1 CreateContainerConfigError 0 4s
21:55:09:205265000 ubuntu-65976bb789-l48wz 1/1 Running 0 22s
The output for Secret:
21:54:47:385714000 NAME TYPE DATA AGE
21:54:47:393145000 sh.helm.release.v1.example.v1 helm.sh/release.v1 1 0s
21:54:47:719864000 sh.helm.release.v1.example.v1 helm.sh/release.v1 1 0s
21:54:51:182609000 understood-squid-redis Opaque 1 0s
21:54:52:001031000 understood-squid-redis Opaque 1 0s
-> 21:54:55:686461000 example-binding Opaque 6 0s
Additional resources:
Stackoverflow.com: Answer: Helm install in certain order
Alibabacloud.com: Helm charts and templates hooks and tests part 3
So to answer my own question (and thanks to #dawid-kruk and the folks on Service Catalog Sig on Slack):
In fact, the initial start of my Pods (the ones referencing the Secret created out of the ServiceBinding) fails! It fails because the Secret is actually not there the moment K8S tries to start the pods.
Kubernetes has a self-healing mechanism, in the sense that it tries (and retries) to reach the target state of the cluster as described by the various deployed resources.
By Kubernetes retrying to get the pods running, eventually (when the Secret is finally there) all conditions will be satisfied to make the pods start up nicely. Therefore, eventually, evth. is running as it should.
How could this be streamlined? One possibility would be for Helm to include the custom resources ServiceBinding and ServiceInstance into its ordered list of installable resources and install them early in the installation phase.
But even without that, Kubernetes actually deals with it just fine. The order of installation (in this case) really does not matter. And that is a good thing!

What is the recommended way to get the pods of a Kubernetes deployment?

Especially considering all the asynchronous procedures involved with creating and updating a deployment, I find it difficult to reliably find the current pods associated with the current version of a given deployment.
Currently, I do:
Add unique labels to the deployment's template.
Get the revision number of the deployment.
Get all replica sets with the labels.
Filter them further to find the one with the correct revision number.
Extract the pod template hash from the replica set.
Get all pods with the labels plus the pod template hash.
This is awkward and complex. Besides, I am not sure that (4) and (6) are guaranteed to yield only the wanted objects. But I cannot filter by ownerReferences, can I?
Is there a more robust and simpler way?
When you create Deployment, it creates ReplicaSet, which creates Pods.
ReplicaSet contains "ownerReferences" path which includes the name and the UID of the parent deployment.
Pods contain the same path with the link to the parent ReplicaSet.
Here is an example of ReplicaSet info:
# kubectl get rs nginx-deployment-569477d6d8 -o yaml
apiVersion: extensions/v1beta1
kind: ReplicaSet
...
name: nginx-deployment-569477d6d8
namespace: default
ownerReferences:
- apiVersion: extensions/v1beta1
blockOwnerDeletion: true
controller: true
kind: Deployment
name: nginx-deployment
uid: acf5fe8a-5d0e-11e8-b14f-42010a8000fc
...

Run a pod without scheduler in Kubernetes

I searched the documentation but I am unable to find out if I can run a pod in Kubernetes without Scheduler. If anyone can help with any pointers it would be helpful
Update:
I can attach a label to node and let pod stick to that label but that would involve going through the scheduler. Is there any method without daemonset and does not use scheduler.
The scheduler just sets the spec.nodeName field on the pod. You can set that to a node name yourself if you know which node you want to run your pod, though you are then responsible for ensuring the node has sufficient resources to run the pod (enough memory, free host ports, etc… all things the scheduler is normally responsible for checking before it assigns a pod to a node)
You want static pods
Static pods are managed directly by kubelet daemon on a specific node, without API server observing it. It does not have associated any replication controller, kubelet daemon itself watches it and restarts it when it crashes.
You can simply add a nodeName attribute to the pod definition which by they normally field out by the scheduler therefor it's not a mandatory field.
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- image: nginx
name: nginx
nodeName: node01
if the pod has been created and in pending state you have to recreate it with the new field, edit is not permitted with the nodeName att.
All the answers given here would require a scheduler to run.
I think what you want to do is create the manifest file of the pod and put it in the default manifest directory of the node in question.
Default directory is /etc/kubernetes/manifests/
The pod will automatically be created and if you wish to delete it, just delete the manifest file.
You can simply add a nodeName attribute to the pod definition
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
nodeName: controlplane
containers:
- image: nginx
name: nginx
Now important point - check the node listed by using below command, and then assign to one of them:
kubectl get nodes

Restart pods when configmap updates in Kubernetes?

How do I automatically restart Kubernetes pods and pods associated with deployments when their configmap is changed/updated?
I know there's been talk about the ability to automatically restart pods when a config maps changes but to my knowledge this is not yet available in Kubernetes 1.2.
So what (I think) I'd like to do is a "rolling restart" of the deployment resource associated with the pods consuming the config map. Is it possible, and if so how, to force a rolling restart of a deployment in Kubernetes without changing anything in the actual template? Is this currently the best way to do it or is there a better option?
The current best solution to this problem (referenced deep in https://github.com/kubernetes/kubernetes/issues/22368 linked in the sibling answer) is to use Deployments, and consider your ConfigMaps to be immutable.
When you want to change your config, create a new ConfigMap with the changes you want to make, and point your deployment at the new ConfigMap. If the new config is broken, the Deployment will refuse to scale down your working ReplicaSet. If the new config works, then your old ReplicaSet will be scaled to 0 replicas and deleted, and new pods will be started with the new config.
Not quite as quick as just editing the ConfigMap in place, but much safer.
Signalling a pod on config map update is a feature in the works (https://github.com/kubernetes/kubernetes/issues/22368).
You can always write a custom pid1 that notices the confimap has changed and restarts your app.
You can also eg: mount the same config map in 2 containers, expose a http health check in the second container that fails if the hash of config map contents changes, and shove that as the liveness probe of the first container (because containers in a pod share the same network namespace). The kubelet will restart your first container for you when the probe fails.
Of course if you don't care about which nodes the pods are on, you can simply delete them and the replication controller will "restart" them for you.
The best way I've found to do it is run Reloader
It allows you to define configmaps or secrets to watch, when they get updated, a rolling update of your deployment is performed. Here's an example:
You have a deployment foo and a ConfigMap called foo-configmap. You want to roll the pods of the deployment every time the configmap is changed. You need to run Reloader with:
kubectl apply -f https://raw.githubusercontent.com/stakater/Reloader/master/deployments/kubernetes/reloader.yaml
Then specify this annotation in your deployment:
kind: Deployment
metadata:
annotations:
configmap.reloader.stakater.com/reload: "foo-configmap"
name: foo
...
Helm 3 doc page
Often times configmaps or secrets are injected as configuration files in containers. Depending on the application a restart may be required should those be updated with a subsequent helm upgrade, but if the deployment spec itself didn't change the application keeps running with the old configuration resulting in an inconsistent deployment.
The sha256sum function can be used together with the include function to ensure a deployments template section is updated if another spec changes:
kind: Deployment
spec:
template:
metadata:
annotations:
checksum/config: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
[...]
In my case, for some reasons, $.Template.BasePath didn't work but $.Chart.Name does:
spec:
replicas: 1
template:
metadata:
labels:
app: admin-app
annotations:
checksum/config: {{ include (print $.Chart.Name "/templates/" $.Chart.Name "-configmap.yaml") . | sha256sum }}
You can update a metadata annotation that is not relevant for your deployment. it will trigger a rolling-update
for example:
spec:
template:
metadata:
annotations:
configmap-version: 1
If k8>1.15; then doing a rollout restart worked best for me as part of CI/CD with App configuration path hooked up with a volume-mount. A reloader plugin or setting restartPolicy: Always in deployment manifest YML did not work for me. No application code changes needed, worked for both static assets as well as Microservice.
kubectl rollout restart deployment/<deploymentName> -n <namespace>
Had this problem where the Deployment was in a sub-chart and the values controlling it were in the parent chart's values file. This is what we used to trigger restart:
spec:
template:
metadata:
annotations:
checksum/config: {{ tpl (toYaml .Values) . | sha256sum }}
Obviously this will trigger restart on any value change but it works for our situation. What was originally in the child chart would only work if the config.yaml in the child chart itself changed:
checksum/config: {{ include (print $.Template.BasePath "/config.yaml") . | sha256sum }}
Consider using kustomize (or kubectl apply -k) and then leveraging it's powerful configMapGenerator feature. For example, from: https://kubectl.docs.kubernetes.io/references/kustomize/kustomization/configmapgenerator/
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
# Just one example of many...
- name: my-app-config
literals:
- JAVA_HOME=/opt/java/jdk
- JAVA_TOOL_OPTIONS=-agentlib:hprof
# Explanation below...
- SECRETS_VERSION=1
Then simply reference my-app-config in your deployments. When building with kustomize, it'll automatically find and update references to my-app-config with an updated suffix, e.g. my-app-config-f7mm6mhf59.
Bonus, updating secrets: I also use this technique for forcing a reload of secrets (since they're affected in the same way). While I personally manage my secrets completely separately (using Mozilla sops), you can bundle a config map alongside your secrets, so for example in your deployment:
# ...
spec:
template:
spec:
containers:
- name: my-app
image: my-app:tag
envFrom:
# For any NON-secret environment variables. Name is automatically updated by Kustomize
- configMapRef:
name: my-app-config
# Defined separately OUTSIDE of Kustomize. Just modify SECRETS_VERSION=[number] in the my-app-config ConfigMap
# to trigger an update in both the config as well as the secrets (since the pod will get restarted).
- secretRef:
name: my-app-secrets
Then, just add a variable like SECRETS_VERSION into your ConfigMap like I did above. Then, each time you change my-app-secrets, just increment the value of SECRETS_VERSION, which serves no other purpose except to trigger a change in the kustomize'd ConfigMap name, which should also result in a restart of your pod. So then it becomes:
I also banged my head around this problem for some time and wished to solve this in an elegant but quick way.
Here are my 20 cents:
The answer using labels as mentioned here won't work if you are updating labels. But would work if you always add labels. More details here.
The answer mentioned here is the most elegant way to do this quickly according to me but had the problem of handling deletes. I am adding on to this answer:
Solution
I am doing this in one of the Kubernetes Operator where only a single task is performed in one reconcilation loop.
Compute the hash of the config map data. Say it comes as v2.
Create ConfigMap cm-v2 having labels: version: v2 and product: prime if it does not exist and RETURN. If it exists GO BELOW.
Find all the Deployments which have the label product: prime but do not have version: v2, If such deployments are found, DELETE them and RETURN. ELSE GO BELOW.
Delete all ConfigMap which has the label product: prime but does not have version: v2 ELSE GO BELOW.
Create Deployment deployment-v2 with labels product: prime and version: v2 and having config map attached as cm-v2 and RETURN, ELSE Do nothing.
That's it! It looks long, but this could be the fastest implementation and is in principle with treating infrastructure as Cattle (immutability).
Also, the above solution works when your Kubernetes Deployment has Recreate update strategy. Logic may require little tweaks for other scenarios.
How do I automatically restart Kubernetes pods and pods associated
with deployments when their configmap is changed/updated?
If you are using configmap as Environment you have to use the external option.
Reloader
Kube watcher
Configurator
Kubernetes auto-reload the config map if it's mounted as volume (If subpath there it won't work with that).
When a ConfigMap currently consumed in a volume is updated, projected
keys are eventually updated as well. The kubelet checks whether the
mounted ConfigMap is fresh on every periodic sync. However, the
kubelet uses its local cache for getting the current value of the
ConfigMap. The type of the cache is configurable using the
ConfigMapAndSecretChangeDetectionStrategy field in the
KubeletConfiguration struct. A ConfigMap can be either propagated by
watch (default), ttl-based, or by redirecting all requests directly to
the API server. As a result, the total delay from the moment when the
ConfigMap is updated to the moment when new keys are projected to the
Pod can be as long as the kubelet sync period + cache propagation
delay, where the cache propagation delay depends on the chosen cache
type (it equals to watch propagation delay, ttl of cache, or zero
correspondingly).
Official document : https://kubernetes.io/docs/concepts/configuration/configmap/#mounted-configmaps-are-updated-automatically
ConfigMaps consumed as environment variables are not updated automatically and require a pod restart.
Simple example Configmap
apiVersion: v1
kind: ConfigMap
metadata:
name: config
namespace: default
data:
foo: bar
POD config
spec:
containers:
- name: configmaptestapp
image: <Image>
volumeMounts:
- mountPath: /config
name: configmap-data-volume
ports:
- containerPort: 8080
volumes:
- name: configmap-data-volume
configMap:
name: config
Example : https://medium.com/#harsh.manvar111/update-configmap-without-restarting-pod-56801dce3388
Adding the immutable property to the config map totally avoids the problem. Using config hashing helps in a seamless rolling update but it does not help in a rollback. You can take a look at this open-source project - 'Configurator' - https://github.com/gopaddle-io/configurator.git .'Configurator' works by the following using the custom resources :
Configurator ties the deployment lifecycle with the configMap. When
the config map is updated, a new version is created for that
configMap. All the deployments that were attached to the configMap
get a rolling update with the latest configMap version tied to it.
When you roll back the deployment to an older version, it bounces to
configMap version it had before doing the rolling update.
This way you can maintain versions to the config map and facilitate rolling and rollback to your deployment along with the config map.
Another way is to stick it into the command section of the Deployment:
...
command: [ "echo", "
option = value\n
other_option = value\n
" ]
...
Alternatively, to make it more ConfigMap-like, use an additional Deployment that will just host that config in the command section and execute kubectl create on it while adding an unique 'version' to its name (like calculating a hash of the content) and modifying all the deployments that use that config:
...
command: [ "/usr/sbin/kubectl-apply-config.sh", "
option = value\n
other_option = value\n
" ]
...
I'll probably post kubectl-apply-config.sh if it ends up working.
(don't do that; it looks too bad)