How to get all the pod that are killed by OOMKilled in AKS Log Analytics? - kubernetes

I have a kubernetes cluster in Azure (AKS) with the log analytics enabled. I can see that a lot of pods are being killed by OOMKilled message but I want to troubleshoot this with the log analytics from Azure. My question is how can I track or query, from the log analytics, all the pods that are killed by the OOMKilled reason?
Thanks!

The reason is somewhat hidden in the ContainerLastStatus field (JSON) of the KubePodInventory table. A query to get all pods killed with reason OOMKilled could be:
KubePodInventory
| where PodStatus != "running"
| extend ContainerLastStatusJSON = parse_json(ContainerLastStatus)
| extend FinishedAt = todatetime(ContainerLastStatusJSON.finishedAt)
| where ContainerLastStatusJSON.reason == "OOMKilled"
| distinct PodUid, ControllerName, ContainerLastStatus, FinishedAt
| order by FinishedAt asc

Related

Airflow KubernetesPodOperator Losing Connection to Worker Pod

Experiencing an odd issue with KubernetesPodOperator on Airflow 1.1.14.
Essentially for some jobs Airflow is losing contact with the pod it creates.
[2021-02-10 07:30:13,657] {taskinstance.py:1150} ERROR - ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
When I check logs in kubernetes with kubectl logs I can see that the job carried on past the connection broken error.
The connection broken error seems to happen exactly 1 hour after the last logs that Airflow pulls from the pod (we do have a 1 hour config on connections), but the pod keeps running happily in the background.
I've seen this behaviour repeatedly, and it tends to happen with longer running jobs with a gap in the log output, but I have no other leads. Happy to update the question if certain specifics are misssing.
As I have mentioned in comments section I think you can try to set operators get_logs parameter to False - default value is True .
Take a look: airflow-connection-broken, airflow-connection-issue .

How to get number of pods in AKS that were active in a given timeframe

So, I'm having an unexpectedly hard time figuring this out. I have a kubernetes cluster deployed in AKS. In Azure (or Kubernetes dashboard), How do I view how many active pods there were in a given time frame?
Updated 0106:
You can use the query below to count the number of active pods:
KubePodInventory
| where TimeGenerated > ago(2d)//set the time frame to 2 days
| where PodStatus == "Running"
| project PodStatus
| summarize count() by PodStatus
Here is the test result:
Original answer:
If you have configured monitoring, then you can use kusto query to fetch it.
Steps as below:
1.Go to azure portal -> your AKS.
2.In the left panel -> Monitoring -> click Logs.
3.In the table named KubePodInventory, there is a field PodStatus which you can use it as filter in your query. You can write your own kusto query and specify the Time range via portal(by clicking the Time range button) or in query(by using ago() function). You should also use the count() function to count the number.

ReadWriteMany volumes on kubernetes with terabytes of data

We want to deploy a k8s cluster which will run ~100 IO-heavy pods at the same time. They should all be able to access the same volume.
What we tried so far:
CephFS
was very complicated to set up. Hard to troubleshoot. In the end, it crashed a lot and the cause was not entirely clear.
Helm NFS Server Provisioner
runs pretty well, but when IO peaks a single replica is not enough. We could not get multiple replicas to work at all.
MinIO
is a great tool to create storage buckets in k8s. But our operations require fs mounting. That is theoretically possible with s3fs, but since we run ~100 pods, we would need to run 100 s3fs sidecars additionally. Thats seems like a bad idea.
There has to be some way to get 2TB of data mounted in a GKE cluster with relatively high availability?
Firestorage seems to work, but it's a magnitude more expensive than other solutions, and with a lot of IO operations it quickly becomes infeasible.
I contemplated creating this question on server fault, but the k8s community is a lot smaller than SO's.
I think I have a definitive answer as of Jan 2020, at least for our usecase:
| Solution | Complexity | Performance | Cost |
|-----------------|------------|-------------|----------------|
| NFS | Low | Low | Low |
| Cloud Filestore | Low | Mediocre? | Per Read/Write |
| CephFS | High* | High | Low |
* You need to add an additional step for GKE: Change the base image to ubuntu
I haven't benchmarked Filestore myself, but I'll just go with stringy05's response: others have trouble getting really good throughput from it
Ceph could be a lot easier if it was supported by Helm.

How to find out if a K8s job failed or succeeded using kubectl?

I have a Kubernetes job that runs for some time, and I need to check if it failed or was successful.
I am checking this periodically:
kubectl describe job/myjob | grep "1 Succeeded"
This works but I am concerned that a change in kubernetes can break this; say, the message is changed to "1 completed with success" (stupid text but you know what I mean) and now my grep will not find what it is looking for.
Any suggestions? this is being done in a bash script.
You can get this information from the job using jsonpath filtering to select the .status.succeeded field of the job you are interested in. It will only return the value you are interested in.
from kubectl explain job.status.succeeded:
The number of pods which reached phase Succeeded.
This command will get you that field for the particular job specified:
kubectl get job <jobname> -o jsonpath={.status.succeeded}

Resourcequota & multiple memory-limited jobs - restarting pending jobs takes forever

I'm testing Kubernetes with the intention of being able to run batch jobs in a queue. I've created a resourcequota with
$ kubectl create quota memoryquota --hard=memory=450Mi,
limiting the total memory usage of all containers in the used namespace to 450M. I also have a script run-memhog.sh that creates a memhog-job with a memory limit of X and using Y megs of memory:
kubectl run memhog-$(cat /dev/urandom | tr -dc 'a-z0-9' | fold -w 8 | head -n 1)
--replicas=1 --restart=OnFailure --limits=memory=$1Mi,cpu=100m --record
--image=derekwaynecarr/memhog --command -- memhog -r100 $2m
Running $ for i in {1..4}; do ./run-memhog.sh 200 100; done correctly causes four jobs to be created, two of which complete in around 20 seconds, and the other two, as expected, get a FailedCreate warning with a message
Error creating: pods "memhog-plgxke9m-" is forbidden: exceeded quota: memoryquota, requested: memory=200Mi, used: memory=400Mi, limited: memory=450Mi
Running $ kubectl get jobs shows an expected outcome:
NAME DESIRED SUCCESSFUL AGE
memhog-2covdiww 1 0 35s
memhog-6bg0b6g6 1 1 35s
memhog-plgxke9m 1 0 35s
memhog-w2ujbg1b 1 1 35s
Everything's OK so far, and I'm expecting the two still uncompleted jobs to start running as soon as the resources become available (= after the previous pods/containers are cleared). However, the jobs stay in a pending state for who knows how long - I checked after two hours and they still didn't start running, after which I left the server running overnight and the jobs got completed somewhere during that time.
My question is: what is causing the jobs to be pending for such a long time? Is there anyway I can poll for resource availability more frequently? I tried to search through both the kubectl reference and kubernetes docs, but didn't find any mention of a fix/setting for this.