Kubernetes - How to aggregate application logs - kubernetes

I have a microservice deployed in a Tomcat container/pod. There are four different files generated in the container - access.log, tomcat.log, catalina.out and application.log (log4j output). What is the best approach to send these logs to Elasticsearch (or similar platform).
I read through the information on this page Logging Architecture - Kubernetes 5. Is “Sidecar container with a logging agent” the best option for my use case?
Is it possible to fetch pod labels (e.g.: version) and add it to each line? If it is doable, use a logging agent like fluentd? (I just want to know the direction I should take).

Yes, the best option for your use case is to have to have one tail -f sidecar per log file and then install either a fluentd or a fluent-bit daemonset that will handle shipping and enriching the log events.
The fluentd elasticsearch cluster addon is available at that link. It will install a fluentd daemonset and a minimal ES cluster. The ES cluster is not production ready so please see the README for details on what must be changed.

Is it possible to fetch pod labels (e.g.: version) and add it to each
line?
You can mount information from Pod metadata description to its file system, after that you can configure your agent to use this data. Here is an example:
apiVersion: v1
kind: Pod
metadata:
name: volume-test
spec:
containers:
- name: container-test
image: busybox
volumeMounts:
- name: all-in-one
mountPath: "/projected-volume"
readOnly: true
volumes:
- name: all-in-one
projected:
sources:
- secret:
name: mysecret
items:
- key: username
path: my-group/my-username
- downwardAPI:
items:
- path: "labels"
fieldRef:
fieldPath: metadata.labels
- path: "cpu_limit"
resourceFieldRef:
containerName: container-test
resource: limits.cpu
- configMap:
name: myconfigmap
items:
- key: config
path: my-group/my-config
If it is doable, use a logging agent like fluentd?
Tomcat cannot send logs to Elasticsearch by itself, it needs an agent for that (e.g., Fluentd, Logstash). So, if you want to use Exposing logs directly from the application option, you need to build a Tomcat image with the agent in it. And it seems almost the same as Using a sidecar container with the logging agent option with a harder way to configure. Exposing logs directly from the application option is more related to applications developed by you.

Related

K8s configmap for application dynamic configuration

I have a microservice for handling retention policy.
This application has default configuration for retention, e.g.: size for retention, files location etc.
But we also want create an API for the user to change this configuration with customized values on runtime.
I created a configmap with the default values, and in the application I used k8s client library to get/update/watch the configmap.
My question is, is it correct to use configmap for dynamic buisness configuration? or is it meant for static configuration that user is not supposed to touch during runtime?
Thanks in advance
There are no rules against it. A lot of software leverages kube API to do some kind of logic / state, ie. leader election. All of those require the app to apply changes to a kube resource. With that in mind do remember it always puts some additional load on your API and if you're unlucky that might become an issue. About two years ago we've been experiencing API limits exhaustion on one of the managed k8s services cause we were using a lot of deployments that had rather intensive leader election logic (2 requests per pod every 5 sec). The issue is long gone since then, but it shows what you have to take into account when designing interactions like this (retries, backoffs etc.)
Using configMaps is perfectly fine for such use cases. You can use a client library in order to watch for updates on the given configMap, however a cleaner solution would be to mount the configMap as a file into the pod and have your configuration set up from the given file. Since you're mounting the configMap as a Volume, changes won't need a pod restart for changes to be visible within the pod (unlike env variables that only "refresh" once the pod get's recreated).
Let's say you have this configMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: special-config
namespace: default
data:
SPECIAL_LEVEL: very
SPECIAL_TYPE: charm
And then you mount this configMap as a Volume into your Pod:
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: registry.k8s.io/busybox
command: [ "/bin/sh", "-c", "ls /etc/config/" ]
volumeMounts:
- name: config-volume
mountPath: /etc/config
volumes:
- name: config-volume
configMap:
# Provide the name of the ConfigMap containing the files you want
# to add to the container
name: special-config
restartPolicy: Never
When the pod runs, the command ls /etc/config/ produces the output below:
SPECIAL_LEVEL
SPECIAL_TYPE
This way you would also reduce "noise" to the API-Server as you can simply query the given files for updates to any configuration.

Updating a k8s Prometheus operator's configs to add a scrape target

I have an existing deployment of prometheus operator on my k8s cluster. Now, I want to add a scrape target for a custom exporter I created. I created my prometheus.yaml file, but can't find how to apply this to the existing prometheus operator.
I have no idea how and where you create/modify your prometheus.yaml file, but will show you the convenient way of managing it.
First of all you I would recommend you store prometheus configuration file prometheus.yaml as a configmap. This is super useful plus changes you will do in CM will be automatically without your involving spreading/propagating to the pod that work with/consume this CM.
So after you make changes with new scrapes in CM - it will take some time to propagate changes. Total delay from the moment when the ConfigMap is updated to the moment when new keys are projected to the pod can be as long as kubelet sync period (1 minute by default) + ttl of ConfigMaps cache (1 minute by default) in kubelet.
Now time to make changes live..
Prometheus has an option to reload its configuration on flight. You have 2 options how to do that. More info you can find opening Reloading Prometheus’ Configuration url.
I will stop here only on one solution:
You can send a HTTP POST to the Prometheus web server:
curl -X POST http://localhost:9090/-/reload
Note that as of Prometheus 2.0, the --web.enable-lifecycle command
line flag must be passed for HTTP reloading to work.
If the reload is successful Prometheus will log that it has updated its targets:
INFO[0248] Loading configuration file prometheus.yml source=main.go:196
INFO[0248] Stopping target manager... source=targetmanager.go:203
INFO[0248] Target manager stopped. source=targetmanager.go:216
INFO[0248] Starting target manager... source=targetmanager.go:114
And cherry on cake for you :)
There is a great Prometheus, ConfigMaps and Continuous Deployment that explain how to monitor Prometheus with prometheus(Maybe it will be applicable somehow in your another question). The main stuff I want to show you is that you can automate POST request.
So basically you need a tiny sidecar container that will check CM changes and send POST in case of new changes. This sidecar should be in the same pod with prometheus
Copy pasting example from article here for future reference
spec:
containers:
- name: prometheus
...
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus
- name: watch
image: weaveworks/watch:master-5b2a6e5
imagePullPolicy: IfNotPresent
args: ["-v", "-t", "-p=/etc/prometheus", "curl", "-X", "POST", "--fail", "-o", "-", "-sS", "http://localhost:80/-/reload"]
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus
volumes:
- name: config-volume
configMap:
name: prometheus-config
Documentation for the operator: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/additional-scrape-config.md
But you have to use additionalScrapeConfigsSecret:
additionalScrapeConfigsSecret:
enabled: true
name: additional-scrape-configs
key: prometheus-additional.yaml
Else you get the Error cannot unmarshal !!map into []yaml.MapSlice
Here is a better documentation:
https://github.com/prometheus-community/helm-charts/blob/8b45bdbdabd9b54766c4beb3c562b766b268a034/charts/kube-prometheus-stack/values.yaml#L2691
According to this, you could add scrape configs without packaging into a secret like this:
additionalScrapeConfigs: |
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]

Identify which GKE Node is serving a client request

I have deployed an application on Google Kubernetes Engine. I would like to identify which client request is being services by which node/pod in GKE. Is there a way to map a client request to the pod/node it was serviced by?
The answer to your question greatly depends on the amount of monitoring and instrumentation you have at your disposal.
The most common way to go about it is to add a prometheus client to the code running on your pods, and use it to write metrics containing labels that can identify the client requests you are interested in.
Once Prometheus scrapes your metrics, they will be enriched with the node/pod emitting them, and you can get the data you are after.
I think The Downward API is what you need. It allows you to expose Pod and node info to the running container. Your application can simply echo the content of certain env variables containing the information you need. This way you can see which Pod and scheduled on which node is handling a particular request.
A few words about what it is from kubernetes documentation:
There are two ways to expose Pod and Container fields to a running
Container:
Environment variables
Volume Files
Together, these two ways of exposing Pod and Container fields are
called the Downward API.
I would recommend you to take a closer look specifically at Exposing Pod Information to Containers Through Environment Variables. The following example Pod exposes to the container its name as well as node name:
apiVersion: v1
kind: Pod
metadata:
name: dapi-envars-fieldref
spec:
containers:
- name: test-container
image: k8s.gcr.io/busybox
command: [ "sh", "-c"]
args:
- while true; do
echo -en '\n';
printenv MY_NODE_NAME MY_POD_NAME;
sleep 10;
done;
env:
- name: MY_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
restartPolicy: Never
It's just an example that I hope meets your particular requirements but keep in mind that you may expose this way many more relevant information. Take a quick look at the list of Capabilities of the Downward API.

Container not maintaining its state using kubernetes?

I have a service which runs in apache. The container status is showing as completed and restarting. Why container is not maintaining its state as running even though the arguments passed does not have issues?
apiVersion: apps/v1
kind: Deployment
metadata:
name: ***
spec:
selector:
matchLabels:
app: ***
replicas: 1
template:
metadata:
labels:
app: ***
spec:
containers:
- name: ***
image: ****
command: ["/bin/sh", "-c"]
args: ["echo\ sid\ |\ sudo\ -S\ service\ mysql\ start\ &&\ sudo\ service\ apache2\ start"]
volumeMounts:
- mountPath: /var/log/apache2/
name: apache
- mountPath: /var/log/***/
name: ***
imagePullSecrets:
- name: regcred
volumes:
- name: apache
hostPath:
path: "/home/sandeep/logs/apache"
- name: vusmartmaps
hostPath:
path: "/home/sandeep/logs/***"
Soon after executing this arguments it is showing its status as completed and going to a loop. What we can do to maintain it status as running.
Please be advised this is not a good practice.
If you really want this working that way your last process must not end.
For example add sleep 9999 to your container.args
Best options would be splitting those into 2 separate Deployments.
First, would be easy to scale them independently.
Second, image would be smaller for each Deployment.
Third, Kubernetes would have a full control over those Deployments and you could utilize self-healing and rolling-updates.
There is a really good guide and examples on Deploying WordPress and MySQL with Persistent Volumes, which I think would be perfect for you.
But if you prefer to use just one pod then you would need to split you image or using official Docker images and your pod might look like this:
apiVersion: v1
kind: Pod
metadata:
name: app
labels:
app: test
spec:
containers:
- name: mysql
image: mysql:5.6
- name: apache
image: httpd:alpine
ports:
- containerPort: 80
volumeMounts:
- name: apache
mountPath: /var/log/apache2/
volumes:
- name: apache
hostPath:
path: "/home/sandeep/logs/apache"
You would need to expose the pod using Service:
$ kubectl expose pod app --type=NodePort --port=80
service "app" exposed
Checking what port it has:
$ kubectl describe service app
...
NodePort: <unset> 31418/TCP
...
Also you should read Communicate Between Containers in the Same Pod Using a Shared Volume.
You want to start apache and mysql in the same container and keep it running, aren't you?
Well, lets break down why it exits first. Kubernetes, just like Docker, will run whatever command you would give inside the container. If that command finishes, container would stop. echo sid | sudo -S service mysql start && sudo service apache2 start will ask your init process to start both mysql and apache, but the thing is that Kubernetes is not aware of your init inside the container.
In fact, the command statement will become instead of init process with pid 1, overriding whatever default startup command you have in your container image. Whenever process with pid 1 exits, container stops.
Therefore in your case you have to start whatever init system you have in your container.
However we come closer to another problem - Kubernetes already acts as init system. It starts your pods and supervises them. Therefore all you need is to start two containers instead - one for mysql and another one for apache.
For example you could use official dockerhub images from https://hub.docker.com//httpd/ and https://hub.docker.com//mysql. They already come with both services configured to startup correctly, therefore you don't even have to specify command and args in your deployment manifest.
Containers are not tiny VMs. You need two in this case, one running MySQL and another running Apache. Both have standard community images available, which I would probably start with.

How to enable logging for third party containers in Kubernetes?

The similar as Docker using this as below to configure logging in compose file for third party (mariadb, opentsdb ...) to show logs on Kibana.
logging:
driver: fluentd
options:
fluentd-address: "0.0.0.0:24224"
tag: "docker.{{.ID}}"
I want to ask that how to configure for Kubernetes?
Basically, you can use fluentd to collect logs and push them to 3rd party log storage (StackDriver or ElasticSearch). To ensure that fluentd is running on every cluster node we can use DaemonSet object.
As an example let’s see a part of the file content:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluentd
namespace: kube-system
...
spec:
...
spec:
containers:
- name: fluentd
image: quay.io/fluent/fluentd-kubernetes-daemonset
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch-logging"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
...
This article describes most important steps to get everything set up.
Get Fluentd DaemonSet sources
We have created a Fluentd DaemonSet that have the proper rules and container image ready to get started:
https://github.com/fluent/fluentd-kubernetes-daemonset
Please grab a copy of the repository from the command line using GIT:
$ git clone https://github.com/fluent/fluentd-kubernetes-daemonset