How to pass a flag to klog for structured logging - kubernetes

As part of kubernetes 1.19, structured logging has been implemented.
I've read that kubernetes log's engine is klog and structured logs are following this format :
<klog header> "<message>" <key1>="<value1>" <key2>="<value2>" ...
Cool ! But even better, you apparently can pass a --logging-format=json flag to klog so logs are generated in json directly !
{
"ts": 1580306777.04728,
"v": 4,
"msg": "Pod status updated",
"pod":{
"name": "nginx-1",
"namespace": "default"
},
"status": "ready"
}
Unfortunately, I haven't been able to find out how and where I should specify that --logging-format=json flag.
Is it a kubectl command? I'm using Azure's aks.

--logging-format=json is a flag which need to be set on all Kuberentes System Components ( Kubelet, API-Server, Controller-Manager & Scheduler). You can check all flags here.
Unfortunately you cant do it right now with AKS as you have the managed control plane from Microsoft.

Related

Unable to retrieve custom metrics from prometheus-adapter

i am trying to experiment with scaling one of my application pods running on my raspberry pi kubernetes cluster using HPA + custom metrics but ran into several issues which despite reading the documentations on https://github.com/DirectXMan12/k8s-prometheus-adapter and troubleshooting for the past 2 days, i am still having difficulties grasping why some problems are happening.
Firstly, i built an ARM-compatible image of k8s-prometheus-adapter and install it using helm. I can confirm its running properly by checking the pod logs.
I have also set up a script which sends raspberry pis temperature to pushgateway and i can query via this Prometheus query node_temp, which will return the following series
node_temp{job="kube4"} 42
node_temp{job="kube1"} 44
node_temp{job="kube2"} 39
node_temp{job="kube3"} 40
Now i want to be able to scale one of my application pods using the above temperature values as an experiment to understand better how it works.
Below is my k8s-prometheus-adapter helm values.yml file
image:
repository: jaanhio/k8s-prometheus-adapter-arm
tag: latest
logLevel: 7
prometheus:
url: http://10.17.0.12
rules:
default: false
custom:
- seriesQuery: 'etcd_object_counts'
resources:
template: <<.Resource>>
name:
as: "etcd_object"
metricsQuery: count(etcd_object_counts)
- seriesQuery: 'node_temp'
resources:
template: <<.Resource>>
name:
as: "node_temp"
metricsQuery: count(node_temp)
After installing via helm, i ran kubectl get apiservices and can see v1beta1.custom.metrics.k8s.io listed.
i then ran kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq and got the following
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "jobs.batch/node_temp",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "jobs.batch/etcd_object",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
]
i then tried to query the value of the registered node_temp metrics using kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/jobs/*/node_temp but got the following response
Error from server (InternalError): Internal error occurred: unable to list matching resources
Questions:
Why is the node_temp metrics associated with jobs.batch resource type?
Why am i not able to retrieve the value of metrics via kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/jobs/*/node_temp?
What is a definitive way of figuring the path of the query? e.g /apis/custom.metrics.k8s.io/v1beta1/jobs/*/node_temp i kinda trial and error until i got see somewhat of a response. i also see some other path with namespaces in the query e.g /apis/custom.metrics.k8s.io/v1beta1/namespaces/*/metrics/foo_metrics
Any help and advice will be greatly appreciate!
Why is the node_temp metrics associated with jobs.batch resource type?
It picks the labels attached to the prometheus metrics and tries to interpret them, in this case u have clearely "job-kube4"
Why am i not able to retrieve the value of metrics via kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/jobs/*/node_temp?
Metrics are namespaced, see the "namespaced:true" so you'll need "/apis/custom.metrics.k8s.io/v1beta1/namespaces//jobs//node_temp"
What is a definitive way of figuring the path of the query? e.g /apis/custom.metrics.k8s.io/v1beta1/jobs//node_temp i kinda trial and error until i got see somewhat of a response. i also see some other path with namespaces in the query e.g /apis/custom.metrics.k8s.io/v1beta1/namespaces//metrics/foo_metrics
Check https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/custom-metrics-api.md#api-paths

Cannot create Topic with ARM to Service Bus Namespaces with Geo-Redundant Disaster Recovery

I have created "Service Bus Namespaces with Geo-Redundant Disaster Recovery", which creates 2 premium namespaces with 1 units each as it should. https://github.com/Azure/azure-quickstart-templates/tree/master/101-servicebus-create-namespace-geo-recoveryconfiguration
Then I try to create Topic, but failing. I like to create with own ARM so that any day I can add new Topics. I would like to create several topics here.
This ARM seems to try create new namespace while I would like to use existing namespace created earlier.
https://github.com/Azure/azure-quickstart-templates/tree/master/101-servicebus-topic
New-AzResourceGroupDeployment : 11.05.49 - Resource Microsoft.ServiceBus/namespaces 'sb-namepace-a' failed with message '{
"error": {
"message": "SKU change invalid for ServiceBus namespace. Cannot downgrade premium namespace. CorrelationId: 1111f842-1ddf-417a-a302-
829b6445e30c",
"code": "BadRequest"
}
}'
the error pretty clearly says - you are trying to change the SKU. add the SKU part back and it should work:
"sku": {
"name": "Premium",
"tier": "Premium",
"capacity": 4
},

Log spam with "unable to find container named fluentd-gcp"

Last night my Kubernetes cluster on GKE was upgraded to 1.16.8-gke.9. Since then the logs show error: unable to find container named fluentd-gcp every minute. Logging from my applications still works, but I'd like to know what causes this error and how to get rid of this.
Expanding the error yields slightly more details:
{
"textPayload": "error: unable to find container named fluentd-gcp\n",
"insertId": "v1b2u2ldrnswujhz2",
"resource": {
"type": "k8s_container",
"labels": {
"project_id": "foo",
"pod_name": "fluentd-gke-scaler-cd4d654d7-tgg27",
"cluster_name": "foo-cluster",
"container_name": "fluentd-gke-scaler",
"namespace_name": "kube-system",
"location": "us-east1-d"
}
},
"timestamp": "2020-04-24T16:15:40.224944500Z",
"severity": "ERROR",
"labels": {
"gke.googleapis.com/log_type": "system",
"k8s-pod/k8s-app": "fluentd-gke-scaler",
"k8s-pod/pod-template-hash": "cd4d654d7"
},
"logName": "projects/foo/logs/stderr",
"receiveTimestamp": "2020-04-24T16:15:45.923960735Z"
}
kubectl get all --all-namespaces shows fluentd-gke pods with a fluentd-gke container, not fluentd-gcp.
Any advice would be appreciated and I'm happy to post more details, if you tell me where to look for them.
Edit: More details and related problems on the GKE issue tracker: https://issuetracker.google.com/issues/156965162
This will be fixed in GKE 1.16.9-gke.6 according to the issue tracker: https://issuetracker.google.com/issues/156965162
1.16.8-gke.9 is currently being offered through rapid channel. Keep in mind that such a channel is offered on an early access basis for people to test new releases, as such the version offered may be subject to unresolved issues with no known workaround. That said a possible fix could be to drain and migrate your workloads to another node. If issue persists, then create an issue here.

Is it possible, and how to limit kubernetes job to create a maxium number of pods if always fail?

As a QA in our company I am daily user of kubernetes, and we use kubernetes job to create performance tests pods. One advantage of job, according to the docs, is
to create one Job object in order to reliably run one Pod to completion
But in our tests this feature will create infinite pods if previous ones fail, which will occupy resources of our team's shared cluster, and deleting such pods will take a lot of time. see this image:
Currently the job manifest is like this:
{
"apiVersion": "batch/v1",
"kind": "Job",
"metadata": {
"name": "upgradeperf",
"namespace": "ntg6-grpc26-tts"
},
"spec": {
"template": {
"spec": {
"containers": [
{
"name": "upgradeperfjob",
"image":
"mycompany.com:5000/ncs-cd-qa/upgradeperf:0.1.1",
"command": [
"python",
"/jmeterwork/jmeter.py",
"-gu",
"git#gitlab-pri-eastus2.dev.mycompany.net:mobility-ncs-tools/tts-cdqa-tool.git",
"-gb",
"upgradeperf",
"-t",
"JMeter/testcases/ttssvc/JMeterTestPlan_ttssvc_cmpsize.jmx",
"-JtestDataFile",
"JMeter/testcases/ttssvc/testData/avaml_opus.csv",
"-JthreadNum",
"3",
"-JthreadLoopCount",
"1500",
"-JresultsFile",
"results_upgradeperf_cavaml_opus_t3_l1500.csv",
"-Jhost",
"mtl-blade32-03.mycompany.com",
"-Jport",
"28416"
]
}
],
"restartPolicy": "Never",
"imagePullSecrets": [
{
"name": "docker-registry-secret"
}
]
}
}
}
}
In some cases, such as misconfiguring of ip/ports, 'reliably run one Pod to completion' is impossible and recreating pods is waste of time and resource.
So is it possible, and how to limit kubernetes job to create a maxium number(say 3) of pods if always fail?
Depending on your kubernetes version, you can resolve this problem with these methods:
set the option: restartPolicy: OnFailure, then the failed container will be restarted in the same Pod, so you will not get lots of failed Pods, instead you will get a Pod with lots of restart.
From Kubernetes 1.8 on, There is a parameter backoffLimit to control the restart policy of failed job. This parameter defines the retry times of the job before treating the job to be failed, default 6 times. For this parameter to work you must set the parameter restartPolicy: Never .
You probably didn't set restartPolicy: Never in your pod spec, add that and I would expect it matches your expected behaviors better.

Hyperledger on Bluemix: Failed to launch chaincode spec(Could not get deployment transaction

I'm running a simple Hyperledger network on Bluemix. I can deploy, and invoke, but not query. The chaincode function, Init sets up a value for var, "abc" ... stub.PutState("abc", []byte(strconv.Itoa(Aval)))
I should be able to query "abc" as validation the code is ready to use. Instead, I'm seeing this error:
"... Error:Failed to launch chaincode spec(Could not get deployment
transaction for - LedgerError - ResourceNotFound:
ledger: resource not found)"
The query json is:
{
"jsonrpc": "2.0",
"method": "query",
"params": {
"type": 1,
"chaincodeID": {
"name": "my chaincode id"
},
"ctorMsg": {
"function": "read",
"args": [
"abc"
]
},
"secureContext": "user_type1_3"
},
"id": 0
}
The following is the list of probable causes of the error
Could not get deployment transaction for - LedgerError -
ResourceNotFound: ledger: resource not found
1. Chaincode did not get deployed correctly. To check if the
chaincode was deployed correctly, you need to check the peer logs to
see if there were any errors when the deploy transaction was sent.
2. Chaincode got deployed correctly, but the consensus mechanism hasnt
yet completed. You should ideally wait for a few minutes after
deploying a chaincode before you try to query it.
3. The Chaincode got deployed, but the chaincode ID/name specified
while trying to send a query is incorrect. You need to make sure you
use the same chaincode ID that comes in the response when you deploy
a chaincode.