How can I see all Jobs, both successful and failed? - kubernetes

Here is a transcript:
LANELSON$ kubectl --kubeconfig foo get -a jobs
No resources found.
OK, fine; even with the -a option, no jobs exist. Cool! Oh, let's just be paranoid and check for one that we know was created. Who knows? Maybe we'll learn something:
LANELSON$ kubectl --kubeconfig foo get -a job emcc-poc-emcc-broker-mp-populator
NAME DESIRED SUCCESSFUL AGE
emcc-poc-emcc-broker-mp-populator 1 0 36m
Er, um, what?
In this second case, I just happen to know the name of a job that was created, so I ask for it directly. I would have thought that kubectl get -a jobs would have returned it in its output. Why doesn't it?
Of course what I'd really like to do is get the logs of one of the pods that the job created, but kubectl get -a pods doesn't show any of that job's terminated pods either, and of course I don't know the name of any of the pods that the job would have spawned.
What is going on here?
Kubernetes 1.7.4 if it matters.

The answer is that Istio automatic sidecar injection happened to be "on" in the environment (I had no idea, nor should I have). When this happens, you can opt out of it, but otherwise all workloads are affected by default (!). If you don't opt out of it, and Istio's presence causes your Job not to be created for any reason, then your Job is technically uninitialized. If a resource is uninitialized, then it does not show up in kubectl get lists. To make an uninitialized resource show up in kubectl get lists, you need to include the --include-uninitialized option to get. So once I issued kubectl --kubeconfig foo get -a --include-uninitialized jobs, I could see the failed jobs.
My higher-level takeaway is that the initializer portion of Kubernetes, currently in alpha, is not at all ready for prime time yet.

Related

unable to create a pv due to VOL_DIR: parameter not set

I'm running rke2 version v1.22.7+rke2r2 in 3 nodes. Today I decide to reinstall my application and I'm not able to do it anymore due to a problem in claiming PV.
I have had never this problems before, and I think is due to an update on local-path-provisioner but I'm not sure I'm still a newbie about kube.
Anyway these are the commands I run before installing my solution:
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml
kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
I omitted metallb. Then as a test I try to install the test specified in the local-path-provisioner website (https://github.com/rancher/local-path-provisioner):
kubectl create -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/examples/pvc/pvc.yaml
kubectl create -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/examples/pod/pod.yaml
What I see is that the pvc stays in a PENDING status, then I check the pod creation in local-path-storage namespace and I see that the helper-pod-create-pvc-xxxx goes in error.
I try to get some logs and the only thing I was able to grab is this:
kubectl -n local-path-storage logs helper-pod-create-pvc-dd8cecf3-d65b-48f7-9e04-d56a20573f8e -f
/script/setup: line 3: VOL_DIR: parameter not set
So it seems VOL_DIR is not set for whatever reason. But I never did a custom configuration, it always starts without problem, and to be honest I don't know what put in VOL_DIR env variable and where.
I just answer to my question. It seems to be a bug on local-path-provisioner
they are fixing it.
In the meantime, instead of using the last one present in the master that has the bug, please use 0.0.21, like this:
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.21/deploy/local-path-storage.yaml
I tested and it works fine.
The deploy manifest in master branch is already fixed.
The master branch is for development, so please use the v0.0.x (e.g v0.0.21, stable release) for production use.

How do I debug this Kubernetes coreDNS error?

What does this error from my coredns pod log mean and how do I debug it?
[ERROR] plugin/errors: 2 2858211404501823821.6843583298703021155. HINFO: read udp 192.168.27.16:47449->67.207.67.3:53: i/o timeout
The behavior is odd.
A single test pod will execute a curl command correctly, but the network will not.
Also each node is able to speak with each of the other nodes.
To my knowledge I have not changed any relevant configurations since the network last functioned "as expected."
UPDATE:
So I do not know if this counts as a solution, but I deleted all pods (including coreDNS) and allowed them to restart. The system now works.
I will keep this question up and mark as solved just in case anyone does not know this nifty command (do not use on a production cluster)
kubectl delete po -A --all
I deleted all pods (including coreDNS) and allowed them to restart. The system now works.
I will keep this question up and mark as solved just in case anyone does not know this nifty command (do not use on a production cluster)
kubectl delete po -A --all
Another way to do this (probably safer) is:
kubectl -n kube-system rollout restart deployment coredns
Thanks to #Richard_Bateman

Knowing the replica count in Kubernetes

I'm wondering for a batch distributed job I need to run. Is there a way in K8S if I use a Job/Stateful Set or whatever, a way for the pod itself(via ENV var or whatever) to know its 1 of X pods run for this job?
I need to chunk up some data and have each process fetch the stuff it needs.
--
I guess the statefulset hostname setting is one way of doing it. Is there a better option?
This is planned but not yet implemented that I know of. You probably want to look into higher order layers like Argo Workflows or Airflow instead for now.
You could write some infrastructure as code using Ansible that will perform the following tasks in order:
kubectl create -f jobs.yml
kubectl wait --for=condition=complete job/job1
kubectl wait --for=condition=complete job/job2
kubectl wait --for=condition=complete job/job3
kubectl create -f pod.yml
kubectl wait can be used in situations like this to halt progress until an action has been performed. In this case, a job has completed its run.
Here is a similar question that someone asked on StackOverflow before.

Is there any kubectl command to poll until all the pod roll to new code?

I am building deploy pipeline. I Need a "kubectl" command that would tell me that rollout is completed to all the pods then I can deploy to next stage.
The Deployment documentation suggests kubectl rollout status, which among other things will return a non-zero exit code if the deployment isn't complete. kubectl get deployment will print out similar information (how many replicas are expected, available, and up-to-date), and you can add a -w option to watch it.
For this purpose you can also consider using one of the Kubernetes APIs. You can "get" or "watch" the deployment object, and get back something matching the structure of a Deployment object. Using that you can again monitor the replica count, or the embedded condition list, and decide if it's ready or not. If you're using the "watch" API you'll continue to get updates as the object status changes.
The one trick here is detecting failed deployments. Say you're deploying a pod that depends on a database; usual practice is to configure the pod with the hostname you expect the database to have, and just crash (and get restarted) if it's not there yet. You can briefly wind up in CrashLoopBackOff state when this happens. If your application or deployment is totally wrong, of course, you'll also wind up in CrashLoopBackOff state, and your deployment will stop progressing. There's not an easy way to tell these two cases apart; consider an absolute timeout.

Kubernetes Job Status check

Quick question regarding Kubernetes job status.
Lets assume I submit my resource to 10 PODS and want to check if my JOB is completed successfully.
What is the best available options that we can use from KUBECTL commands.
I think of kubectl get jobs but the problem here is you have only two codes 0 & 1. 1 for completion 0 for failed or running, we cannot really depend on this
Other option is kubectl describe to check the PODS status like out of 10 PODS how many are commpleted/failed.
Any other effective way of monitoring the PODs? Please let me know
Anything that can talk to the Kubernetes API can query for Job object and look at the JobStatus field, which has info on which pods are running, completed, failed, or unavailable. kubectl is probably the easiest, as you mentioned, but you could write something more specialized using any client library if you wanted/needed to.
I was able to able to achieve it by running the following, which will show the status of the container:
kubectl describe job