Unable to run kubectl from ansible - kubernetes

I'm trying to run a kubectl command from ansible.
Basically the command will tell me if at least one pod is running from a deployment.
kubectl get deploy sample-v1-deployment -o json -n sample | jq '.status.conditions[] | select(.reason == "MinimumReplicasAvailable") | .status' | tr -d '"'
I tried to run it from a playbook but I'm getting
Unable to connect to the server: net/http: TLS handshake timeout
This is my playbook:
- hosts: master
gather_facts: no
become: true
tasks:
- name: test command
shell: kubectl get deploy sample-v1-deployment -o json -n sample | jq '.status.conditions[] | select(.reason == "MinimumReplicasAvailable") | .status' | tr -d '"'
register: result
This is the output from ansible:
changed: [k8smaster01.test.com] => {
"changed": true,
"cmd": "kubectl get deploy sample-v1-deployment -o json -n sample | jq '.status.conditions[] | select(.reason == \"MinimumReplicasAvailable\") | .status' | tr -d '\"'",
"delta": "0:00:10.507704",
"end": "2019-04-02 20:59:17.882277",
"invocation": {
"module_args": {
"_raw_params": "kubectl get deploy sample-v1-deployment -o json -n sample | jq '.status.conditions[] | select(.reason == \"MinimumReplicasAvailable\") | .status' | tr -d '\"'",
"_uses_shell": true,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"warn": true
}
},
"rc": 0,
"start": "2019-04-02 20:59:07.374573",
"stderr": "Unable to connect to the server: net/http: TLS handshake timeout",
"stderr_lines": [
"Unable to connect to the server: net/http: TLS handshake timeout"
],
"stdout": "",
"stdout_lines": []
}
I can run the command manually on the master server without problems. I was also able to use k8s module to create different things on my kubernetes cluster.
I know there is a kubectl module on ansible, could it be the problem?
Thanks

I found a couple of workarounds.
One was to use the k8s_facts module
- name: Ensure running application
k8s_facts:
namespace: sample
kind: Pod
label_selectors:
- app=sample-v1-app
register: pod_list
until: pod_list.resources[0].status.phase == 'Running'
delay: 10
retries: 3
Its simple and gets the works done.
The second workaround was to use the raw module instead of shell or command
- name: Get running status
raw: kubectl get deploy sample-v1-deployment -o json -n sample | jq -r '.status.conditions[] | select(.reason == "MinimumReplicasAvailable") | .status'
I'm not sure about using raw. It looks like a hammer for a simple task.
But reading about the module makes me think this problem is related with the syntax (quotes, double quotes, |) more than the command it self.
Executes a low-down and dirty SSH command, not going through the
module subsystem. This is useful and should only be done in a few
cases. A common case is installing python on a system without python
installed by default. Another is speaking to any devices such as
routers that do not have any Python installed.

Looks like you can connect to your kube-apiserver on the master from a shell, but not from ansible. The error message indicates differences in the kubeconfig.
You can see the kube-apiserver endpoint configured on your ~/.kube/config like this:
$ kubectl config view --minify -o jsonpath='{.clusters[].cluster.server}'
It's typically something like this: https://<servername>:6443. You can try running the command from ansible to see if you get the same kube-apiserver.
Another thing is you can try is to print the value of the KUBECONFIG env variable from ansible to see if it's set to something different from ~/.kube/config
Hope it helps!

Related

searching for a keyword in all the pods/replicas of a kuberntes deployment

I am running a deployment called mydeployment that manages several pods/replicas for a certain service. I want to search all the service pods/instances/replicas of that deployment for a certain keyword. The command below defaults to one replica and returns the keyword matching in this replica only.
Kubectl logs -f deploy/mydeployment | grep "keyword"
Is it possible to customize the above command to return all the matching keywords out of all instances/pods of the deployment mydeployment? Any hint?
Save this to a file fetchLogs.sh file, and if you are using Linux box, use sh fetchLogs.sh
#!/bin/sh
podName="key-word-from-pod-name"
keyWord="actual-log-search-keyword"
nameSpace="name-space-where-the-pods-or-running"
echo "Running Script.."
for podName in `kubectl get pods -A -o name | grep -i ${podName} | cut -d'/' -f2`;
do
echo "searching pod ${podName}"
kubectl -n ${nameSpace} logs pod/${podName} | grep -i ${keyWord}
done
I used the pods, if you want to use deployment, the idea is same change the kubectl command accordingly.

Get count of Kubernetes pods that aren't running

I've this command to list the Kubernetes pods that are not running:
sudo kubectl get pods -n my-name-space | grep -v Running
Is there a command that will return a count of pods that are not running?
If you add ... | wc -l to the end of that command, it will print the number of lines that the grep command outputs. This will probably include the title line, but you can suppress that.
kubectl get pods -n my-name-space --no-headers \
| grep -v Running \
| wc -l
If you have a JSON-processing tool like jq available, you can get more reliable output (the grep invocation will get an incorrect answer if an Evicted pod happens to have the string Running in its name). You should be able to do something like (untested)
kubectl get pods -n my-namespace -o json \
| jq '.items | map(select(.status.phase != "Running")) | length'
If you'll be doing a lot of this, writing a non-shell program using the Kubernetes API will be more robust; you will generally be able to do an operation like "get pods" using an SDK call and get back a list of pod objects that you can filter.
You can do it without any external tool:
kubectl get po \
--field-selector=status.phase!=Running \
-o go-template='{{len .items}}'
the filtering is done with field-selectors
the counting is done with go-template: {{ len .items }}

Trying to get rid of orphan volumes in heketi results in error without reason

I'm trying to get rid of a bunch of orphaned volumes in heketi. When I try, I get "Error" and then information about the volume I just tried to delete, serialized as JSON. There's nothing else. I've tried to dig into the logs but they don't reveal anything.
This is the command I used to try and delete the volume:
heketi-cli -s $HEKETI_CLI_SERVER --user admin --secret ${KEY} volume delete 22f1a960651f0f16ada20a15d68c7dd6
Error: {"size":30,"name":"vol_22f1a960651f0f16ada20a15d68c7dd6","durability":{"type":"none","replicate":{},"disperse":{}},"gid":2008,"glustervolumeoptions":["","cluster.post-op-delay-secs 0"," performance.client-io-threads off"," performance.open-behind off"," performance.readdir-ahead off"," performance.read-ahead off"," performance.stat-prefetch off"," performance.write-behind off"," performance.io-cache off"," cluster.consistent-metadata on"," performance.quick-read off"," performance.strict-o-direct on"," storage.health-check-interval 0",""],"snapshot":{"enable":true,"factor":1},"id":"22f1a960651f0f16ada20a15d68c7dd6","cluster":"e924a50aa93d9eae1132c60eb1f36310","mount":{"glusterfs":{"hosts":["<SECRET>"],"device":"<SECRET>:vol_22f1a960651f0f16ada20a15d68c7dd6","options":{"backup-volfile-servers":""}}},"blockinfo":{},"bricks":[{"id":"0f4c6d7f605e9368bfe3dc7cc117b69a","path":"/var/lib/heketi/mounts/vg_970f0faf60f8dfc6f6a0d6bd25bdea7c/brick_0f4c6d7f605e9368bfe3dc7cc117b69a/brick","device":"970f0faf60f8dfc6f6a0d6bd25bdea7c","node":"107894a855c9d2c34509b18272e6c298","volume":"22f1a960651f0f16ada20a15d68c7dd6","size":31457280}]}
Notice that the second line only contains Error, then the info about the volume serialized as json.
The volume doesn't exist in gluster. I used the below commands to verify the volume was no longer there:
kubectl -n default exec -t -i glusterfs-rgz9g bash
gluster volume info
<shows volume i did not delete>
Kubernetes does not show a PersistentVolumeClaim or PersistentVolume:
kubectl get pvc -A
No resources found.
kubectl get pv -A
No resources found.
I tried looking at the heketi logs, but it only reports a GET for the volume
kubectl -n default logs heketi-56f678775c-nrbwd
[negroni] 2019-11-25T21:29:19Z | 200 | 1.407715ms | <SECRET>:8080 | GET /volumes/22f1a960651f0f16ada20a15d68c7dd6
[negroni] 2019-11-25T21:29:19Z | 200 | 1.111984ms | <SECRET>:8080 | GET /volumes/22f1a960651f0f16ada20a15d68c7dd6
[negroni] 2019-11-25T21:29:19Z | 200 | 1.540357ms | <SECRET>:8080 | GET /volumes/22f1a960651f0f16ada20a15d68c7dd6
I've tried setting more verbose log level but the setting doesn't stick:
heketi-cli -s $HEKETI_CLI_SERVER --user admin --secret ${KEY} loglevel set debug
Server log level updated
heketi-cli -s $HEKETI_CLI_SERVER --user admin --secret ${KEY} loglevel get
info
My CLI uses
heketi-cli -v
heketi-cli v9.0.0
And the Heketi server is running:
kubectl -n default exec -t -i heketi-56f678775c-nrbwd bash
heketi -v
Heketi v9.0.0-124-gc2e2a4ab
Based on the logs, I believe heketi-cli has an issue and then never actually sends the POST or DELETE request to the heketi server.
How do I proceed to debug this? At this point my only work around is to recreate my cluster but I'd like to avoid that especially if something like this comes back.
Looks like there's a bug in heketi-cli because if I manually craft the request using ruby and curl, I'm able to delete the volume:
TOKEN=$(ruby makeToken.rb DELETE /volumes/22f1a960651f0f16ada20a15d68c7dd6)
curl -X DELETE -H "Authorization: Bearer $TOKEN" http://10.233.21.178:8080/volumes/22f1a960651f0f16ada20a15d68c7dd6
Please see https://github.com/heketi/heketi/blob/master/docs/api/api.md#authentication-model for how to generate the jwt token.
I manually created the request, expected getting a better error message that the command line tool swallowed. Turns out the cli was actually busted.
ruby code for making the jwt token (makeToken.rb). You need to fill in pass and server.
#!/usr/bin/env ruby
require 'jwt'
require 'digest'
user = "admin"
pass = "<SECRET>"
server = "http://localhost:8080/"
method = "#{ARGV[0]}"
uri = "#{ARGV[1]}"
payload = {}
headers = {
iss: 'admin',
iat: Time.now.to_i,
exp: Time.now.to_i + 600,
qsh: Digest::SHA256.hexdigest("#{method}&#{uri}")
}
token = JWT.encode headers, pass, 'HS256'
print("#{token}")

Namespace "stuck" as Terminating, How I removed it

I had a "stuck" namespace that I deleted showing in this eternal "terminating" status.
Assuming you've already tried to force-delete resources like:
Pods stuck at terminating status, and your at your wits' end trying to recover the namespace...
You can force-delete the namespace (perhaps leaving dangling resources):
(
NAMESPACE=your-rogue-namespace
kubectl proxy &
kubectl get namespace $NAMESPACE -o json |jq '.spec = {"finalizers":[]}' >temp.json
curl -k -H "Content-Type: application/json" -X PUT --data-binary #temp.json 127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize
)
This is a refinement of the answer here, which is based on the comment here.
I'm using the jq utility to programmatically delete elements in the finalizers section. You could do that manually instead.
kubectl proxy creates the listener at 127.0.0.1:8001 by default. If you know the hostname/IP of your cluster master, you may be able to use that instead.
The funny thing is that this approach seems to work even when using kubectl edit making the same change has no effect.
This is caused by resources still existing in the namespace that the namespace controller is unable to remove.
This command (with kubectl 1.11+) will show you what resources remain in the namespace:
kubectl api-resources --verbs=list --namespaced -o name \
| xargs -n 1 kubectl get --show-kind --ignore-not-found -n <namespace>
Once you find those and resolve and remove them, the namespace will be cleaned up
As mentioned before in this thread there is another way to terminate a namespace using API not exposed by kubectl by using a modern version of kubectl where kubectl replace --raw is available (not sure from which version). This way you will not have to spawn a kubectl proxy process and avoid dependency with curl (that in some environment like busybox is not available). In the hope that this will help someone else I left this here:
kubectl get namespace "stucked-namespace" -o json \
| tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" \
| kubectl replace --raw /api/v1/namespaces/stucked-namespace/finalize -f -
Need to remove the finalizer for kubernetes.
Step 1:
kubectl get namespace <YOUR_NAMESPACE> -o json > <YOUR_NAMESPACE>.json
remove kubernetes from finalizers array which is under spec
Step 2:
kubectl replace --raw "/api/v1/namespaces/<YOUR_NAMESPACE>/finalize" -f ./<YOUR_NAMESPACE>.json
Step 3:
kubectl get namespace
You can see that the annoying namespace is gone.
Solution:
Use command below without any changes. it works like a charm.
NS=`kubectl get ns |grep Terminating | awk 'NR==1 {print $1}'` && kubectl get namespace "$NS" -o json | tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" | kubectl replace --raw /api/v1/namespaces/$NS/finalize -f -
Enjoy
I loved this answer extracted from here
It is just 2 commands.
In one terminal:
kubectl proxy
In another terminal:
kubectl get ns delete-me -o json | \
jq '.spec.finalizers=[]' | \
curl -X PUT http://localhost:8001/api/v1/namespaces/delete-me/finalize -H "Content-Type: application/json" --data #-
Single line command
kubectl patch ns <Namespace_to_delete> -p '{"metadata":{"finalizers":null}}'
Simple trick
You can edit namespace on console only kubectl edit <namespace name> remove/delete "Kubernetes" from inside the finalizer section(Should be like "finalizers": [ ]) and press enter or save/apply changes.
in one step also you can do it.
Trick : 1
kubectl get namespace annoying-namespace-to-delete -o json > tmp.json
then edit tmp.json and remove "kubernetes" from Finalizers
Open another terminal Run command kubectl proxy and run below Curl
curl -k -H "Content-Type: application/json" -X PUT --data-binary
#tmp.json https://localhost:8001/api/v1/namespaces/<NAMESPACE NAME TO DELETE>/finalize
and it should delete your namespace.
Step by step guide
Start the proxy using command :
kubectl proxy
kubectl proxy & Starting to serve on
127.0.0.1:8001
find namespace
kubectl get ns
{Your namespace name} Terminating 1d
put it in file
kubectl get namespace {Your namespace name} -o json > tmp.json
edit the file tmp.json and remove the finalizers
}, "spec": { "finalizers": [ "kubernetes" ] },
after editing it should look like this
}, "spec": { "finalizers": [ ] },
we are almost there simply now run the curl with updating namespace value in it
curl -k -H "Content-Type: application/json" -X PUT --data-binary
#tmp.json http://127.0.0.1:8001/api/v1/namespaces/{Your namespace
name}/finalize
and it's gone
**
For us it was the metrics-server crashing.
So to check if this is relevant to you'r case with the following run: kubectl api-resources
If you get
error: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Then its probably the same issue
Credits goes to #javierprovecho here
I've written a one-liner Python3 script based on the common answers here. This script removes the finalizers in the problematic namespace.
python3 -c "namespace='<my-namespace>';import atexit,subprocess,json,requests,sys;proxy_process = subprocess.Popen(['kubectl', 'proxy']);atexit.register(proxy_process.kill);p = subprocess.Popen(['kubectl', 'get', 'namespace', namespace, '-o', 'json'], stdout=subprocess.PIPE);p.wait();data = json.load(p.stdout);data['spec']['finalizers'] = [];requests.put('http://127.0.0.1:8001/api/v1/namespaces/{}/finalize'.format(namespace), json=data).raise_for_status()"
💡 rename namespace='<my-namespace>' with your namespace.
e.g. namespace='trust'
Full script: https://gist.github.com/jossef/a563f8651ec52ad03a243dec539b333d
Run kubectl get apiservice
For the above command you will find an apiservice with Available Flag=Flase.
So, just delete that apiservice using kubectl delete apiservice <apiservice name>
After doing this, the namespace with terminating status will disappear.
Forcefully deleting the namespace or removing finalizers is definitely not the way to go since it could leave resources registered to a non existing namespace.
This is often fine but then one day you won't be able to create a resource because it is still dangling somewhere.
The upcoming Kubernetes version 1.16 should give more insights into namespaces finalizers, for now I would rely on identification strategies.
A cool script which tries to automate these is: https://github.com/thyarles/knsk
However it works across all namespaces and it could be dangerous. The solution it s based on is: https://github.com/kubernetes/kubernetes/issues/60807#issuecomment-524772920
tl;dr
Checking if any apiservice is unavailable and hence doesn't serve its resources: kubectl get apiservice|grep False
Finding all resources that still exist via kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -n $your-ns-to-delete
(credit: https://github.com/kubernetes/kubernetes/issues/60807#issuecomment-524772920)
I write simple script to delete your stucking namespace based on #Shreyangi Saxena 's solution.
cat > delete_stuck_ns.sh << "EOF"
#!/usr/bin/env bash
function delete_namespace () {
echo "Deleting namespace $1"
kubectl get namespace $1 -o json > tmp.json
sed -i 's/"kubernetes"//g' tmp.json
kubectl replace --raw "/api/v1/namespaces/$1/finalize" -f ./tmp.json
rm ./tmp.json
}
TERMINATING_NS=$(kubectl get ns | awk '$2=="Terminating" {print $1}')
for ns in $TERMINATING_NS
do
delete_namespace $ns
done
EOF
chmod +x delete_stuck_ns.sh
This Script can detect all namespaces in Terminating state, and delete it.
PS:
This may not work in MacOS, cause the native sed in macos is not compatible with GNU sed.
you may need install GNU sed in your MacOS, refer to this answer.
Please confirm that you can access your kubernetes cluster through command kubectl.
Has been tested on kubernetes version v1.15.3
Update
I found a easier solution:
kubectl patch RESOURCE NAME -p '{"metadata":{"finalizers":[]}}' --type=merge
here is a (yet another) solution. This uses jq to remove the finalisers block from the json, and does not require kubectl proxy:
namespaceToDelete=blah
kubectl get namespace "$namespaceToDelete" -o json \
| jq 'del(.spec.finalizers)' \
| kubectl replace --raw /api/v1/namespaces/$namespaceToDelete/finalize -f -
Please try with below command:
kubectl patch ns <your_namespace> -p '{"metadata":{"finalizers":null}}'
1. Using Curl Command
Issue Mentioned: https://amalgjose.com/2021/07/28/how-to-manually-delete-a-kubernetes-namespace-stuck-in-terminating-state/
export NAMESPACE=<specifice-namespace>
kubectl get namespace $NAMESPACE -o json > tempfile.json
Edit the JSON file and remove all values from spec.finalizers
Save it and then apply this command on separate tab
(Must be open in separate Tab)
kubectl proxy
And run this command on same tab:
curl -k -H "Content-Type: application/json" -X PUT --data-binary #tempfile.json http://127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize
Check namespace if terminating namespace is removed or not
kubectl get namespaces
2. Using Kubectl Command
Issue Mentioned: https://aws.amazon.com/premiumsupport/knowledge-center/eks- terminated-namespaces/
Save a JSON file similar to the following:
export NAMESPACE=<specifice-namespace>
kubectl get namespace $NAMESPACE -o json > tempfile.json
Edit the JSON file and remove all values from spec.finalizers
To apply the changes, run a command similar to the following:
kubectl replace --raw "/api/v1/namespaces/$NAMESPACE/finalize" -f ./tempfile.json
Verify that the terminating namespace is removed:
kubectl get namespaces
In my case the problem was caused by a custom metrics.
To know what is causing the issue, just run this command:
kubectl api-resources | grep -i false
That should give you which api resources are causing the problem. Once identified just delete it:
kubectl delete apiservice v1beta1.custom.metrics.k8s.io
Once deleted, the namespace should disappear.
Replace ambassador with your namespace
Check if the namespace is stuck
kubectl get ns ambassador
NAME STATUS AGE
ambassador Terminating 110d
This is stuck from a long time
Open a admin terminal/cmd prompt or powershell and run
kubectl proxy
This will start a local web server: Starting to serve on 127.0.0.1:8001
Open another terminal and run
kubectl get ns ambassador -o json >tmp.json
edit the tmp.json using vi or nano
from this
{
"apiVersion": "v1",
"kind": "Namespace",
"metadata": {
"annotations": {
"kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{},\"name\":\"ambassador\"}}\n"
},
"creationTimestamp": "2021-01-07T18:23:28Z",
"deletionTimestamp": "2021-04-28T06:43:41Z",
"name": "ambassador",
"resourceVersion": "14572382",
"selfLink": "/api/v1/namespaces/ambassador",
"uid": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
"spec": {
"finalizers": [
"kubernetes"
]
},
"status": {
"conditions": [
{
"lastTransitionTime": "2021-04-28T06:43:46Z",
"message": "Discovery failed for some groups, 3 failing: unable to retrieve the complete list of server APIs: compose.docker.com/v1alpha3: an error on the server (\"Internal Server Error: \\\"/apis/compose.docker.com/v1alpha3?timeout=32s\\\": Post https://0.0.0.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews: write tcp 0.0.0.0:53284-\u0026gt;0.0.0.0:443: write: broken pipe\") has prevented the request from succeeding, compose.docker.com/v1beta1: an error on the server (\"Internal Server Error: \\\"/apis/compose.docker.com/v1beta1?timeout=32s\\\": Post https://10.96.0.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews: write tcp 0.0.0.0:5284-\u0026gt;10.96.0.1:443: write: broken pipe\") has prevented the request from succeeding, compose.docker.com/v1beta2: an error on the server (\"Internal Server Error: \\\"/apis/compose.docker.com/v1beta2?timeout=32s\\\": Post https://0.0.0.0:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews: write tcp 1.1.1.1:2284-\u0026gt;0.0.0.0:443: write: broken pipe\") has prevented the request from succeeding",
"reason": "DiscoveryFailed",
"status": "True",
"type": "NamespaceDeletionDiscoveryFailure"
},
{
"lastTransitionTime": "2021-04-28T06:43:49Z",
"message": "All legacy kube types successfully parsed",
"reason": "ParsedGroupVersions",
"status": "False",
"type": "NamespaceDeletionGroupVersionParsingFailure"
},
{
"lastTransitionTime": "2021-04-28T06:43:49Z",
"message": "All content successfully deleted",
"reason": "ContentDeleted",
"status": "False",
"type": "NamespaceDeletionContentFailure"
}
],
"phase": "Terminating"
}
}
to
{
"apiVersion": "v1",
"kind": "Namespace",
"metadata": {
"annotations": {
"kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{},\"name\":\"ambassador\"}}\n"
},
"creationTimestamp": "2021-01-07T18:23:28Z",
"deletionTimestamp": "2021-04-28T06:43:41Z",
"name": "ambassador",
"resourceVersion": "14572382",
"selfLink": "/api/v1/namespaces/ambassador",
"uid": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
"spec": {
"finalizers": []
}
}
by deleting status and kubernetes inside finalizers
Now use the command and replace ambassador with your namespace
curl -k -H "Content-Type: application/json" -X PUT --data-binary #tmp.json http://127.0.0.1:8001/api/v1/namespaces/ambassador/finalize
you will see another json like before then run
then run the command
kubectl get ns ambassador
Error from server (NotFound): namespaces "ambassador" not found
If it still says terminating or any other error make sure you format your json in a proper way and try the steps again.
Run the following command to view the namespaces that are stuck in the Terminating state:
kubectl get namespaces
Select a terminating namespace and view the contents of the namespace to find out the finalizer. Run the following command:
kubectl get namespace -o yaml
Your YAML contents might resemble the following output:
apiVersion: v1
kind: Namespace
metadata:
creationTimestamp: 2019-12-25T17:38:32Z
deletionTimestamp: 2019-12-25T17:51:34Z
name: <terminating-namespace>
resourceVersion: "4779875"
selfLink: /api/v1/namespaces/<terminating-namespace>
uid: ******-****-****-****-fa1dfgerz5
spec:
finalizers:
- kubernetes
status:
phase: Terminating
Run the following command to create a temporary JSON file:
kubectl get namespace -o json >tmp.json
Edit your tmp.json file. Remove the kubernetes value from the finalizers field and save the file. Output would be like:
{
"apiVersion": "v1",
"kind": "Namespace",
"metadata": {
"creationTimestamp": "2018-11-19T18:48:30Z",
"deletionTimestamp": "2018-11-19T18:59:36Z",
"name": "<terminating-namespace>",
"resourceVersion": "1385077",
"selfLink": "/api/v1/namespaces/<terminating-namespace>",
"uid": "b50c9ea4-ec2b-11e8-a0be-fa163eeb47a5"
},
"spec": {
},
"status": {
"phase": "Terminating"
}
}
To set a temporary proxy IP and port, run the following command. Be sure to keep your terminal window open until you delete the stuck namespace:
kubectl proxy
Your proxy IP and port might resemble the following output:
Starting to serve on 127.0.0.1:8001
From a new terminal window, make an API call with your temporary proxy IP and port:
curl -k -H "Content-Type: application/json" -X PUT --data-binary #tmp.json http://127.0.0.1:8001/api/v1/namespaces/your_terminating_namespace/finalize
Your output would be like:
{
"kind": "Namespace",
"apiVersion": "v1",
"metadata": {
"name": "<terminating-namespace>",
"selfLink": "/api/v1/namespaces/<terminating-namespace>/finalize",
"uid": "b50c9ea4-ec2b-11e8-a0be-fa163eeb47a5",
"resourceVersion": "1602981",
"creationTimestamp": "2018-11-19T18:48:30Z",
"deletionTimestamp": "2018-11-19T18:59:36Z"
},
"spec": {
},
"status": {
"phase": "Terminating"
}
}
The finalizer parameter is removed. Now verify that the terminating namespace is removed, run the following command:
kubectl get namespaces
Edit:
It is not recommended to remove finalizers.
Correct approach would be:
Delete all the resources in the namespace.
Github issue link
My usual workspace is a small k8s cluster which I frequently destroy and rebuild it back, and that's why removing finalizers method works for me.
Original answer: I usually run into same problem.
This is what I do
kubectl get ns your-namespace -o json > ns-without-finalizers.json
Edit ns-without-finalizers.json. replace all finalizers with empty array.
Run kubectl proxy ( usually run it on another terminal )
Then curl this command
curl -X PUT http://localhost:8001/api/v1/namespaces/your-namespace/finalize -H "Content-Type: application/json" --data #ns-without-finalizers.json
There are a couple of things you can run. But what this usually means, is that the automatic deletion of namespace was not able to finish, and there is a process running that has to be manually deleted. To find this you can do these things:
Get all prossesse attached to the name space. If this does not result in anything move on to next suggestions
$ kubectl get all -n your-namespace
Some namespaces have apiserivces attached to them and it can be troublesome to delete. This can for that matter be whatever resources you want. Then you delete that resource if it finds anything
$ kubectl get apiservice|grep False
But the main takeaway, is that there might be some things that is not completly removed. So you can see what you initially had in that namespace, and then see what things is spun up with your YAMLs to see the processes up. Or you can start to google why wont service X be properly removed, and you will find things.
If the namespace stuck in Terminating while the resources in that namespace have been already deleted, you can patch the finalizers of the namespace before deleting it:
kubectl patch ns ns_to_be_deleted -p '{"metadata":{"finalizers":null}}';
then
kubectl delete ns ns_to_be_deleted;
Edit:
Please check #Antonio Gomez Alvarado's Answer first. The root cause could be the metrics server that mentioned in that answer.
I tried 3-5 options to remove ns, but only this one works for me.
This sh file will remove all namespaces with Terminating status
$ vi force-delete-namespaces.sh
$ chmod +x force-delete-namespaces.sh
$ ./force-delete-namespaces.sh
#!/usr/bin/env bash
set -e
set -o pipefail
kubectl proxy &
proxy_pid="$!"
trap 'kill "$proxy_pid"' EXIT
for ns in $(kubectl get namespace --field-selector=status.phase=Terminating --output=jsonpath="{.items[*].metadata.name}"); do
echo "Removing finalizers from namespace '$ns'..."
curl -H "Content-Type: application/json" -X PUT "127.0.0.1:8001/api/v1/namespaces/$ns/finalize" -d #- \
< <(kubectl get namespace "$ns" --output=json | jq '.spec = { "finalizers": [] }')
echo
echo "Force-deleting namespace '$ns'..."
kubectl delete namespace "$ns" --force --grace-period=0 --ignore-not-found=true
done
The only way I found to remove a "terminating" namespace is by deleting the entry inside the "finalizers" section. I've tried to --force delete it and to --grace-period=0 none of them worked, however, this method did:
on a command line display the info from the namespace:
$ kubectl get namespace your-rogue-namespace -o yaml
This will give you yaml output, look for a line that looks similar to this:
deletionTimestamp: 2018-09-17T13:00:10Z
finalizers:
- Whatever content it might be here...
labels:
Then simply edit the namespace configuration and delete the items inside that finalizers container.
$ kubectl edit namespace your-rogue-namespace
This will open an editor (in my case VI), went over the line I wanted to delete and deleted it, I pressed the D key twice to delete the whole line.
Save it, quit your editor, and like magic. The rogue-namespace should be gone.
And to confirm it just:
$ kubectl get namespace your-rogue-namespace -o yaml
Completing the already great answer by nobar. If you deployed your cluster with Rancher there is a caveat.
Rancher deployments change EVERY api call, prepending /k8s/clusters/c-XXXXX/ to the URLs.
The id of the cluster on rancher (c-XXXXX) is something you can easily get from the Rancher UI, as it will be there on the URL.
So after you get that cluster id c-xxxx, just do as nobar says, just changing the api call including that rancher bit.
(
NAMESPACE=your-rogue-namespace
kubectl proxy &
kubectl get namespace $NAMESPACE -o json |jq '.spec = {"finalizers":[]}' >temp.json
curl -k -H "Content-Type: application/json" \
-X PUT --data-binary #temp.json \
127.0.0.1:8001/k8s/clusters/c-XXXXX/api/v1/namespaces/$NAMESPACE/finalize
)
Debugging a similar issue.
Two important things to consider:
1 ) Think twice before deleting finalizers from your namespace because there might be resources that you wouldn't want to automatically delete or at least understand what was deleted for troubleshooting.
2 ) Commands like kubectl api-resources --verbs=list might not give you resources that were created by external crds.
In my case:
I viewed my namespace real state (that was stuck on Terminating) with kubectl edit ns <ns-name> and under status -> conditions I saw that some external crds that I installed were failed to be deleted because they add a finalizers defined:
- lastTransitionTime: "2021-06-14T11:14:47Z"
message: 'Some content in the namespace has finalizers remaining: finalizer.stackinstall.crossplane.io
in 1 resource instances, finalizer.stacks.crossplane.io in 1 resource instances'
reason: SomeFinalizersRemain
status: "True"
type: NamespaceFinalizersRemaining
For anyone looking for few commands for later version of Kubernetes, this helped me.
NAMESPACE=mynamespace
kubectl get namespace $NAMESPACE -o json | sed 's/"kubernetes"//' | kubectl replace --raw "/api/v1/namespaces/$NAMESPACE/finalize" -f -
Tested in Kubernetes v1.24.1
Something similar happened to me in my case it was pv & pvc , which I forcefully removed by setting finalizers to null. Check if you could do similar with ns
kubectl patch pvc <pvc-name> -p '{"metadata":{"finalizers":null}}'
For namespaces it'd be
kubectl patch ns <ns-name> -p '{"spec":{"finalizers":null}}'
curl -k -H "Content-Type: application/json" -X PUT --data-binary #tmp.json 127.0.0.1:8001/k8s/clusters/c-mzplp/api/v1/namespaces/rook-ceph/finalize
This worked for me, the namespace is gone.
Detailed explanation can be found in the link https://github.com/rook/rook/blob/master/Documentation/ceph-teardown.md.
This happened when I interrupted kubernetes installation(Armory Minnaker). Then I proceeded to delete the namespace and reinstall it. I was stuck with pod in terminating status due to finalizers. I got the namespace into tmp.json, removed finalizers from tmp.json file and did the curl command.
Once I get past this issue, I used scripts for uninstalling the cluster to remove the residues and did a reinstallation.
kubectl edit namespace ${stucked_namespace}
Then delete finalizers in vi mode and save.
It worked in my case.
Editing NS yaml manually didn't work for me, no error was thrown on editing but changes did not take effect.
This worked for me:
In one session:
kubectl proxy
in another shell:
kubectl get ns <rouge-ns> -o json | jq '.spec.finalizers=[]' | curl -X PUT http://localhost:8001/api/v1/namespaces/<rouge-ns>/finalize -H "Content-Type: application/json" --data #-
source: https://virtual-simon.co.uk/vsphere-kubernetes-force-deleting-stuck-terminating-namespaces-and-contexts/

Tell when Job is Complete

I'm looking for a way to tell (from within a script) when a Kubernetes Job has completed. I want to then get the logs out of the containers and perform cleanup.
What would be a good way to do this? Would the best way be to run kubectl describe job <job_name> and grep for 1 Succeeded or something of the sort?
Since version 1.11, you can do:
kubectl wait --for=condition=complete job/myjob
and you can also set a timeout:
kubectl wait --for=condition=complete --timeout=30s job/myjob
You can visually watch a job's status with this command:
kubectl get jobs myjob -w
The -w option watches for changes. You are looking for the SUCCESSFUL column to show 1.
For waiting in a shell script, I'd use this command:
until kubectl get jobs myjob -o jsonpath='{.status.conditions[?
(#.type=="Complete")].status}' | grep True ; do sleep 1 ; done
You can use official Python kubernetes-client.
https://github.com/kubernetes-client/python
Create new Python virtualenv:
virtualenv -p python3 kubernetes_venv
activate it with
source kubernetes_venv/bin/activate
and install kubernetes client with:
pip install kubernetes
Create new Python script and run:
from kubernetes import client, config
config.load_kube_config()
v1 = client.BatchV1Api()
ret = v1.list_namespaced_job(namespace='<YOUR-JOB-NAMESPACE>', watch=False)
for i in ret.items:
print(i.status.succeeded)
Remember to set up your specific kubeconfig in ~/.kube/config and valid value for your job namespace -> '<YOUR-JOB-NAMESPACE>'
I would use -w or --watch:
$ kubectl get jobs.batch --watch
NAME COMPLETIONS DURATION AGE
python 0/1 3m4s 3m4s
Adding the best answer, from a comment by #Coo, If you add a -f or --follow option when getting logs, it'll keep tailing the log and terminate when the job completes or fails. The $# status code is even non-zero when the job fails.
kubectl logs -l job-name=myjob --follow
One downside of this approach, that I'm aware of, is that there's no timeout option.
Another downside is the logs call may fail while the pod is in Pending (while the containers are being started). You can fix this by waiting for the pod:
# Wait for pod to be available; logs will fail if the pod is "Pending"
while [[ "$(kubectl get pod -l job-name=myjob -o json | jq -rc '.items | .[].status.phase')" == 'Pending' ]]; do
# Avoid flooding k8s with polls (seconds)
sleep 0.25
done
# Tail logs
kubectl logs -l job-name=myjob --tail=400 -f
It either one of these queries with kubectl
kubectl get job test-job -o jsonpath='{.status.succeeded}'
or
kubectl get job test-job -o jsonpath='{.status.conditions[?(#.type=="Complete")].status}'
Although kubectl wait --for=condition=complete job/myjob and kubectl wait --for=condition=complete job/myjob allow us to check whether the job completed but there is no way to check if the job just finished executing (irrespective of success or failure). If this is what you are looking for, a simple bash while loop with kubectl status check did the trick for me.
#!/bin/bash
while true; do
status=$(kubectl get job jobname -o jsonpath='{.status.conditions[0].type}')
echo "$status" | grep -qi 'Complete' && echo "0" && exit 0
echo "$status" | grep -qi 'Failed' && echo "1" && exit 1
done