ship logs in kubernetes

ship logs in kubernetes - kubernetes

I have some services managed by Kubernetes. each service has some number of pods. I want to develop a new service to analyze logs of all other services. To do this, I first need to ship the logs to my new service, but I don't know how.
I can divide my question into two parts.
1- how should I access/read the logs? should I read from /var/logs or run apps using pipe like this:
./app | myprogram
which myprogram gets the logs of app as standard input.
2- how can I send logs to another service? my options are gRPC and Kafka(or RabbitMQ).
Using CephFS volume can also be a solution, but it seems that is an anti-pattern(read How to share storage between Kubernetes pods?)

The below is basic workflow of how you can collect logs from your pods and send it to a logging tool, i have taken an example of fluent-bit (open source) to explain, but you can use tools like Fluentd/Logstash/Filebeat.
Pods logs are stored in specific path on nodes -> Fluent bit runs as daemonset collects logs from nodes using its input plugins -> use Output plugins of fluent bit and send logs to logging tools (Elastic/ Datadog / Logiq etc)
Fluent Bit is an open source log shipper and processor, that collects data from multiple sources and forwards it to different destinations, Fluent-bit has various input plugins which can be used to collect log data from specific paths or port and you can use output plugin to collect logs at a Elastic or any log collector.
please follow the below instruction to install fluent-bit.
https://medium.com/kubernetes-tutorials/exporting-kubernetes-logs-to-elasticsearch-using-fluent-bit-758e8de606af
or
https://docs.fluentbit.io/manual/installation/kubernetes
To get you started, below is a configuration of how your input plugin should look like, it uses a tail plugin and notice the path, this is where the logs are stored on the nodes which are on the kubernetes cluster.
https://docs.fluentbit.io/manual/pipeline/inputs
Parser can be changed according to your requirement or the format of log
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Skip_Long_Lines On
Refresh_Interval 60
Mem_Buf_Limit 1MB
Below is an example of the Output plugin, this is http plugin which is where log collector will be listening, there are various plugins that can be used to configure, depending on the logging tool you choose.
https://docs.fluentbit.io/manual/pipeline/outputs
The below uses http plugin to send data to http(80/443) endpoint
[OUTPUT]
Name http
Match *
Host <Hostname>
Port 80
URI /v1/json_batch
Format json
tls off
tls.verify off
net.keepalive off
compress gzip
below is an output to elastic.
[OUTPUT]
Name es
Match *
Host <hostname>
Port <port>
HTTP_User <user-name>
HTTP_Passwd <password>
Logstash_Format On
Retry_Limit False

Related

Fluency with forward plugin: how to add kubernetes metadata to logs

Hey i have a question.
Im using logback-more-appenders(fluency plugin) to send logs to EFK stack (fluent-bit) which is working in kubernetes cluster, but it lacks kubernetes metadata ( like node/pod names).
I know i can use <additionalField></additionalField> in logbck.xml to add Service name (because this is static), but i cannot do it to dynamic parts like node or pod name.
I tried to do it on fluent-bit side using kubernetes filter, but this works only with tail/systemd inputs not a forward one (it parses tag with filename which contains namespce and pod name). Im using forward plugin to send logs from java software to elasticsearch, and in logback.xml i cannot enter dynamic pod name (or i don't know if i can).
Any tips how i can do it? I prefer to send logs using fluency instead of sniffing host container logs.

In my case, the best i could think of was to change from forward to tail plugin with structured logging (in json).

Have you tried to Pass POD ID and NODE NAME as environment variables in logback.xml as additional fields, that you can attribute the metadata to the logevents?

how to force all kubernetes services (proxy, kublet, apiserver..., containers) to write logs to /var/logs

How to force all kubernetes services (proxy, kublet, apiserver..., containers) to write logs to /var/logs?
For example:
/var/logs/apiServer.log
or:
/var/logs/proxy.log
Can I use syslog config to do that? What would be an example of that config?
I have already tried journald configuration forward to syslogs=yes.

Just first what comes to my mind - create sidecar container that will gather all the logs in 1 place.
The Complete Guide to Kubernetes Logging.
That's a pretty wide question that should be divided on few parts. Kubernets stores different types of logs in different places.
Kubernetes Container Logs (out of this question, but simply kubectl logs <podname> + -n for namespace, if its not default + -c for specifying container inside the pod)
Kubernetes Node Logs
Kubernetes Cluster Logs
Kubernetes Node Logs
Depending on your operating system and services, there are various
node-level logs you can collect, such as kernel logs or systemd logs.
On nodes with systemd both the kubelet and container runtime write to
journald. If systemd is not present, they write to .log files in the
/var/log directory.
You can access systemd logs with the journalctl command.
Tutorial: Logging with journald have a huge explanation how can you configure journalctl to gather logs. With agrregation logs tools like ELK and without them. journald log filtering can simplify your life.
There are two ways of centralizing journal entries via syslog:
syslog daemon acts as a journald client (like journalctl or Logstash or Journalbeat)
journald forwards messages to syslog (via socket)
Option 1) is slower – reading from the journal is slower than reading from the socket – but captures all the fields from the journal.
Option 2) is safer (e.g. no issues with journal corruption), but the journal will only forward traditional syslog fields (like severity, hostname, message..)
Talking about ForwardToSyslog=yes in /etc/systemd/journald.conf --> it will write messages, in syslog format, to /run/systemd/journal/syslog. You can pass processing then this file to rsyslog for example. Either you can manually process logs or move them to desired place..
Kubernetes Cluster Logs
By default, system components outside a container write files to journald, while components running in containers write to /var/log directory. However, there is the option to configure the container engine to stream logs to a preferred location.
Kubernetes doesn’t provide a native solution for logging at cluster level. However, there are other approaches available to you:
Use a node-level logging agent that runs on every node
Add a sidecar container for logging within the application pod
Expose logs directly from the application.
P.S. I have NOT tried below approach, but it looks promising - check it and maybe it will help you in your not easiest task.
The easiest way of setting up a node-level logging agent is to
configure a DaemonSet to run the agent on each node
helm install --name st-agent \
--set infraToken=xxxx-xxxx \
--set containerToken=xxxx-xxxx \
--set logsToken=xxxx-xxxx \
--set region=US \
stable/sematext-agent
This setup will, by default, send all cluster and container logs to a
central location for easy management and troubleshooting. With a tiny
bit of added configuration, you can configure it to collect node-level
logs and audit logs as well.

Logging application logs in DataDog

Using datadog official docs, I am able to print the K8s stdout/stderr logs in DataDog UI, my motive is to print the app logs which are generated by spring boot application at a certain location in my pod.
Configurations done in cluster :
Created ServiceAccount in my cluster along with cluster role and cluster role binding
Created K8s secret to hold DataDog API key
Deployed the DataDog Agent as daemonset in all nodes
Configurations done in App :
Download datadog.jar and instrument it along with my app execution
Exposed ports 8125 and 8126
Added environment tags DD_TRACE_SPAN_TAGS, DD_TRACE_GLOBAL_TAGS in deployment file
Changed pattern in logback.xml
Added logs config in deployment file
Added env tags in deployment file
After doing above configurations I am able to log stdout/stderr logs where as I wanted to log application logs in datadog UI
If someone has done this please let me know what am I missing here.
If required, I can share the configurations as well. Thanks in advance

When installing Datadog in your K8s Cluster, you install a Node Logging Agent as a Daemonset with various volume mounts on the hosting nodes. Among other things, this gives Datadog access to the Pod logs at /var/log/pods and the container logs at /var/lib/docker/containers.
Kubernetes and the underlying Docker engine will only include output from stdout and stderror in those two locations (see here for more information). Everything that is written by containers to log files residing inside the containers, will be invisible to K8s, unless more configuration is applied to extract that data, e.g. by applying the side care container pattern.
So, to get things working in your setup, configure logback to log to stdout rather than /var/app/logs/myapp.log
Also, if you don't use APM there is no need to instrument your code with the datadog.jar and do all that tracing setup (setting up ports etc).

How to collect logs from java app (k8s) to fluentd(k8s)

I have java app in k8s and fluentd (daemonset). In fluentd conf:
*`<source>
#type forward
port 24224
</source>
<match **>
#type stdout
</match>`*
I am little bit confused.
Do I need use fluentd-logger-java lib? I read in docs, that I need add remotehost for fluentd, but here i don't use service in general.
How app will send logs to fluentd pods?
Thanks in advance!

Given that your Java application can log to stdout and stderr you’ll use fluentd to read that log and, in most cases, ship these logs to a system that can aggregate the logs.
This picture, from the official docs, shows a common pattern of configuring node-level logging in Kubernetes with e.g. fluentd as Pods deployed with a DaemonSet:
In the above picture, the logging-agent will be fluentd and my-pod will be your Pod with a container running your Java app. The Logging Backend, from a fluentd configuration perspective, is optional but of course highly recommended. Basically you can choose to output your logs via fluentd stdout.
For this to function properly fluentd will need read access to the container logs, this is accomplished by mounting the log dir e.g. /var/lib/docker/containers into the fluentd container.
We’ve successfully used this fluentd example ConfigMap, with some modifications to read logs from the nodes and ship them to Elasticsearch. Check out the containers.input.conf part of that ConfigMap for more info on container logs and how to digest them.
Note that you shouldn't need to use the fluentd-logger-java library to start using fluentd, although you could use it as another type of logger in your Java application. Out-of-the-box you should be able to let Java log everything to stdout and stderr and read the logs with fluentd.

If you are just concerned with the live logs then you can try a product built on fluent,Elastic search and kibana ; you can get it https://logdna.com.
Just add a tag and deploy the demonset.
You can try its free trail for some days

Forwarding logs from kubernetes to splunk

I'm pretty much new to Kubernetes and don't have hands-on experience on it.
My team is facing issue regarding the log format pushed by kubernetes to splunk.
Application is pushing log to stdout in this format
{"logname" : "app-log", "level" : "INFO"}
Splunk eventually get this format (splunkforwarder is used)
{
"log" : "{\"logname\": \"app-log\", \"level\": \"INFO \"}",
"stream" : "stdout",
"time" : "2018-06-01T23:33:26.556356926Z"
}
This format kind of make things harder in Splunk to query based on properties.
Is there any options in Kubernetes to forward raw logs from app rather than grouping into another json ?
I came across this post in Splunk, but the configuration is done on Splunk side
Please let me know if we have any option from Kubernetes side to send raw logs from application

Kubernetes architecture provides three ways to gather logs:
1. Use a node-level logging agent that runs on every node.
You can implement cluster-level logging by including a node-level logging agent on each node. The logging agent is a dedicated tool that exposes logs or pushes logs to a backend. Commonly, the logging agent is a container that has access to a directory with log files from all of the application containers on that node.
The logs format depends on Docker settings. You need to set up log-driver parameter in /etc/docker/daemon.json on every node.
For example,
{
"log-driver": "syslog"
}
or
{
"log-driver": "json-file"
}
none - no logs are available for the container and docker logs does not
return any output.
json-file - the logs are formatted as JSON. The
default logging driver for Docker.
syslog - writes logging messages to
the syslog facility.
For more options, check the link
2. Include a dedicated sidecar container for logging in an application pod.
You can use a sidecar container in one of the following ways:
The sidecar container streams application logs to its own stdout.
The sidecar container runs a logging agent, which is configured to pick up logs from an application container.
By having your sidecar containers stream to their own stdout and stderr streams, you can take advantage of the kubelet and the logging agent that already run on each node. The sidecar containers read logs from a file, a socket, or the journald. Each individual sidecar container prints log to its own stdout or stderr stream.
3. Push logs directly to a backend from within an application.
You can implement cluster-level logging by exposing or pushing logs directly from every application.
For more information, you can check official documentation of Kubernetes

This week we had the same issue.
Using splunk forwarder DaemonSet
installing https://splunkbase.splunk.com/app/3743/ this plugin on splunk will solve your issue.

Just want to update with the solution what we tried, this worked for our log structure
SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*?)\\n","stream.*/\1/g
SEDCMD-2_unescapequotes = s/\\"/"/g
BREAK_ONLY_BEFORE={"logname":