pod is not showing in ready state - kubernetes

I am trying to configure php phabricator example from kubernetes but after creating the replication controller. POD is not showing in ready state ever. It shows in below state:
NAME READY STATUS RESTARTS AGE
phabricator-controller-z0nk3 0/1 CrashLoopBackOff 5 2m
Below is the controller yaml:
{
"kind": "ReplicationController",
"apiVersion": "v1",
"metadata": {
"name": "phabricator-controller",
"labels": {
"name": "phabricator"
}
},
"spec": {
"replicas": 1,
"selector": {
"name": "phabricator"
},
"template": {
"metadata": {
"labels": {
"name": "phabricator"
}
},
"spec": {
"containers": [
{
"name": "phabricator",
"image": "fgrzadkowski/example-php-phabricator",
"ports": [
{
"name": "http-server",
"containerPort": 80
}
]
}
]
}
}
}
}
Can someone please suggest me how to fix this?

This Pod is crash-looping. You can tell because the number of restarts is greater than zero.
kubectl describe pods <pod-name>
Should give further details to help debug. As will
kubectl logs <pod-name>

Actually tracking issues with kubectl describe pods <pod-name> and kubectl logs <pod-name> is indeed the default way to track issues, unfortunately in my case it WASN'T helpful (at first.) All logs were nice or at least were giving no error or clue that something goes wrong.
Readiness and Liveness probes were however showing the app is not passing through...
So where the devil were hiding? In my case increasing values for "initialDelaySeconds" and/or "timeoutSeconds" for Readiness and Liveness probes did the thing.
My first assumption was the app has not enough time to reach "Ready status". However app was still not ready and failed in fact... !!!BUT!!! extending those values increased deployment attempt time and thus I've been able to reach more logs. And what I got??? "Database connection failed attempt due to the timeout". So no connection to the database, and the app is died in fact. Tricky moment is - timeouts are not appearing quickly and you need to wait a bit more ... at least default values for "initialDelaySeconds" and/or "timeoutSeconds" were unable to give me needed time to see the "database connectivity timeout".
When firewall rule was set to allow app talk to the database, issue has gone!

Related

how to display nodes information with a JSON request?

I know how to use the API to perform simple request such as display node information selecting node by labels value.
For example : curl http://localhost:8080/api/v1/nodes?labelSelector=kubernetes.io/role%3Dworker3
Display informations about node whose role is worker3.
Is there a way to perform the same request using a JSON query ?
looked on the web to find a such example but did not find one.
You can query with kubectl by label.
The Roles of the node are just labels.
To return in yaml format
kubectl get nodes -l node-role.kubernetes.io/worker -o yaml
To return in json format
kubectl get nodes -l node-role.kubernetes.io/worker -o json
Update
Querying the api with json you can do like so:
curl http://localhost:8080/api/v1/nodes?{"node.kubernetes.io/worker01":"worker01"}
This in my case returns this:
{
"kind": "NodeList",
"apiVersion": "v1",
"metadata": {
"resourceVersion": "317238"
},
"items": [
{
"metadata": {
"name": "worker01",
"uid": "a2bec224-361f-49e9-8bba-b3b172816d6e",
"resourceVersion": "316653",
"creationTimestamp": "2022-12-24T11:04:43Z",
"labels": {
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"kubernetes.io/arch": "amd64",
"kubernetes.io/hostname": "worker01",
"kubernetes.io/os": "linux",
"microk8s.io/cluster": "true",
"node.kubernetes.io/microk8s-worker": "microk8s-worker"
},
............
As you can see it works, but you must analyse 2 things generally.
the api version (can be different to v1, depends on the kubernetes version)
the labels and property name.
The example above comes from microk8s, here i havent even Roles defined.
kubectl get node
NAME STATUS ROLES AGE VERSION
master Ready <none> 17d v1.25.4
worker01 Ready <none> 17d v1.25.4
So i looked for some label that could extract the required data.

How to pass a flag to klog for structured logging

As part of kubernetes 1.19, structured logging has been implemented.
I've read that kubernetes log's engine is klog and structured logs are following this format :
<klog header> "<message>" <key1>="<value1>" <key2>="<value2>" ...
Cool ! But even better, you apparently can pass a --logging-format=json flag to klog so logs are generated in json directly !
{
"ts": 1580306777.04728,
"v": 4,
"msg": "Pod status updated",
"pod":{
"name": "nginx-1",
"namespace": "default"
},
"status": "ready"
}
Unfortunately, I haven't been able to find out how and where I should specify that --logging-format=json flag.
Is it a kubectl command? I'm using Azure's aks.
--logging-format=json is a flag which need to be set on all Kuberentes System Components ( Kubelet, API-Server, Controller-Manager & Scheduler). You can check all flags here.
Unfortunately you cant do it right now with AKS as you have the managed control plane from Microsoft.

Log spam with "unable to find container named fluentd-gcp"

Last night my Kubernetes cluster on GKE was upgraded to 1.16.8-gke.9. Since then the logs show error: unable to find container named fluentd-gcp every minute. Logging from my applications still works, but I'd like to know what causes this error and how to get rid of this.
Expanding the error yields slightly more details:
{
"textPayload": "error: unable to find container named fluentd-gcp\n",
"insertId": "v1b2u2ldrnswujhz2",
"resource": {
"type": "k8s_container",
"labels": {
"project_id": "foo",
"pod_name": "fluentd-gke-scaler-cd4d654d7-tgg27",
"cluster_name": "foo-cluster",
"container_name": "fluentd-gke-scaler",
"namespace_name": "kube-system",
"location": "us-east1-d"
}
},
"timestamp": "2020-04-24T16:15:40.224944500Z",
"severity": "ERROR",
"labels": {
"gke.googleapis.com/log_type": "system",
"k8s-pod/k8s-app": "fluentd-gke-scaler",
"k8s-pod/pod-template-hash": "cd4d654d7"
},
"logName": "projects/foo/logs/stderr",
"receiveTimestamp": "2020-04-24T16:15:45.923960735Z"
}
kubectl get all --all-namespaces shows fluentd-gke pods with a fluentd-gke container, not fluentd-gcp.
Any advice would be appreciated and I'm happy to post more details, if you tell me where to look for them.
Edit: More details and related problems on the GKE issue tracker: https://issuetracker.google.com/issues/156965162
This will be fixed in GKE 1.16.9-gke.6 according to the issue tracker: https://issuetracker.google.com/issues/156965162
1.16.8-gke.9 is currently being offered through rapid channel. Keep in mind that such a channel is offered on an early access basis for people to test new releases, as such the version offered may be subject to unresolved issues with no known workaround. That said a possible fix could be to drain and migrate your workloads to another node. If issue persists, then create an issue here.

Is it possible, and how to limit kubernetes job to create a maxium number of pods if always fail?

As a QA in our company I am daily user of kubernetes, and we use kubernetes job to create performance tests pods. One advantage of job, according to the docs, is
to create one Job object in order to reliably run one Pod to completion
But in our tests this feature will create infinite pods if previous ones fail, which will occupy resources of our team's shared cluster, and deleting such pods will take a lot of time. see this image:
Currently the job manifest is like this:
{
"apiVersion": "batch/v1",
"kind": "Job",
"metadata": {
"name": "upgradeperf",
"namespace": "ntg6-grpc26-tts"
},
"spec": {
"template": {
"spec": {
"containers": [
{
"name": "upgradeperfjob",
"image":
"mycompany.com:5000/ncs-cd-qa/upgradeperf:0.1.1",
"command": [
"python",
"/jmeterwork/jmeter.py",
"-gu",
"git#gitlab-pri-eastus2.dev.mycompany.net:mobility-ncs-tools/tts-cdqa-tool.git",
"-gb",
"upgradeperf",
"-t",
"JMeter/testcases/ttssvc/JMeterTestPlan_ttssvc_cmpsize.jmx",
"-JtestDataFile",
"JMeter/testcases/ttssvc/testData/avaml_opus.csv",
"-JthreadNum",
"3",
"-JthreadLoopCount",
"1500",
"-JresultsFile",
"results_upgradeperf_cavaml_opus_t3_l1500.csv",
"-Jhost",
"mtl-blade32-03.mycompany.com",
"-Jport",
"28416"
]
}
],
"restartPolicy": "Never",
"imagePullSecrets": [
{
"name": "docker-registry-secret"
}
]
}
}
}
}
In some cases, such as misconfiguring of ip/ports, 'reliably run one Pod to completion' is impossible and recreating pods is waste of time and resource.
So is it possible, and how to limit kubernetes job to create a maxium number(say 3) of pods if always fail?
Depending on your kubernetes version, you can resolve this problem with these methods:
set the option: restartPolicy: OnFailure, then the failed container will be restarted in the same Pod, so you will not get lots of failed Pods, instead you will get a Pod with lots of restart.
From Kubernetes 1.8 on, There is a parameter backoffLimit to control the restart policy of failed job. This parameter defines the retry times of the job before treating the job to be failed, default 6 times. For this parameter to work you must set the parameter restartPolicy: Never .
You probably didn't set restartPolicy: Never in your pod spec, add that and I would expect it matches your expected behaviors better.

Fail to delete rc by api?

kubernetes version:1.02
REST api
DELETE /api/v1/namespaces/default/replicationcontrollers/test
body
{
"apiVersion": "v1",
"kind": "ReplicationController",
"gracePeriodSeconds": 0}
}
Fail
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "converting to : type names don't match (ReplicationController, DeleteOptions), and no conversion 'func (v1.ReplicationController, api.DeleteOptions) error' registered.",
"code": 500
}
if setting body is empty, delete success, but pod is exist.
kubectl get rc, rc is deleted
kubectl get pod, pod is existting
why?
How can I delete rc with all pods by api delete method?
API requests are designed to be able to be fulfilled immediately. Tasks like reaping/recursively deleting are typically handled by a client by combining multiple API requests. In this case, you can do what kubectl does when running kubectl delete rc/test (which you can see by adding --v=8):
Set the spec.replicas of rc/test to 0
Watch until status.replicas of rc/test is also 0
Delete rc/test