How to collect logs from java app (k8s) to fluentd(k8s) - kubernetes

I have java app in k8s and fluentd (daemonset). In fluentd conf:
*`<source>
#type forward
port 24224
</source>
<match **>
#type stdout
</match>`*
I am little bit confused.
Do I need use fluentd-logger-java lib? I read in docs, that I need add remotehost for fluentd, but here i don't use service in general.
How app will send logs to fluentd pods?
Thanks in advance!

Given that your Java application can log to stdout and stderr you’ll use fluentd to read that log and, in most cases, ship these logs to a system that can aggregate the logs.
This picture, from the official docs, shows a common pattern of configuring node-level logging in Kubernetes with e.g. fluentd as Pods deployed with a DaemonSet:
In the above picture, the logging-agent will be fluentd and my-pod will be your Pod with a container running your Java app. The Logging Backend, from a fluentd configuration perspective, is optional but of course highly recommended. Basically you can choose to output your logs via fluentd stdout.
For this to function properly fluentd will need read access to the container logs, this is accomplished by mounting the log dir e.g. /var/lib/docker/containers into the fluentd container.
We’ve successfully used this fluentd example ConfigMap, with some modifications to read logs from the nodes and ship them to Elasticsearch. Check out the containers.input.conf part of that ConfigMap for more info on container logs and how to digest them.
Note that you shouldn't need to use the fluentd-logger-java library to start using fluentd, although you could use it as another type of logger in your Java application. Out-of-the-box you should be able to let Java log everything to stdout and stderr and read the logs with fluentd.

If you are just concerned with the live logs then you can try a product built on fluent,Elastic search and kibana ; you can get it https://logdna.com.
Just add a tag and deploy the demonset.
You can try its free trail for some days

Related

ship logs in kubernetes

I have some services managed by Kubernetes. each service has some number of pods. I want to develop a new service to analyze logs of all other services. To do this, I first need to ship the logs to my new service, but I don't know how.
I can divide my question into two parts.
1- how should I access/read the logs? should I read from /var/logs or run apps using pipe like this:
./app | myprogram
which myprogram gets the logs of app as standard input.
2- how can I send logs to another service? my options are gRPC and Kafka(or RabbitMQ).
Using CephFS volume can also be a solution, but it seems that is an anti-pattern(read How to share storage between Kubernetes pods?)
The below is basic workflow of how you can collect logs from your pods and send it to a logging tool, i have taken an example of fluent-bit (open source) to explain, but you can use tools like Fluentd/Logstash/Filebeat.
Pods logs are stored in specific path on nodes -> Fluent bit runs as daemonset collects logs from nodes using its input plugins -> use Output plugins of fluent bit and send logs to logging tools (Elastic/ Datadog / Logiq etc)
Fluent Bit is an open source log shipper and processor, that collects data from multiple sources and forwards it to different destinations, Fluent-bit has various input plugins which can be used to collect log data from specific paths or port and you can use output plugin to collect logs at a Elastic or any log collector.
please follow the below instruction to install fluent-bit.
https://medium.com/kubernetes-tutorials/exporting-kubernetes-logs-to-elasticsearch-using-fluent-bit-758e8de606af
or
https://docs.fluentbit.io/manual/installation/kubernetes
To get you started, below is a configuration of how your input plugin should look like, it uses a tail plugin and notice the path, this is where the logs are stored on the nodes which are on the kubernetes cluster.
https://docs.fluentbit.io/manual/pipeline/inputs
Parser can be changed according to your requirement or the format of log
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Skip_Long_Lines On
Refresh_Interval 60
Mem_Buf_Limit 1MB
Below is an example of the Output plugin, this is http plugin which is where log collector will be listening, there are various plugins that can be used to configure, depending on the logging tool you choose.
https://docs.fluentbit.io/manual/pipeline/outputs
The below uses http plugin to send data to http(80/443) endpoint
[OUTPUT]
Name http
Match *
Host <Hostname>
Port 80
URI /v1/json_batch
Format json
tls off
tls.verify off
net.keepalive off
compress gzip
below is an output to elastic.
[OUTPUT]
Name es
Match *
Host <hostname>
Port <port>
HTTP_User <user-name>
HTTP_Passwd <password>
Logstash_Format On
Retry_Limit False

Logging application logs in DataDog

Using datadog official docs, I am able to print the K8s stdout/stderr logs in DataDog UI, my motive is to print the app logs which are generated by spring boot application at a certain location in my pod.
Configurations done in cluster :
Created ServiceAccount in my cluster along with cluster role and cluster role binding
Created K8s secret to hold DataDog API key
Deployed the DataDog Agent as daemonset in all nodes
Configurations done in App :
Download datadog.jar and instrument it along with my app execution
Exposed ports 8125 and 8126
Added environment tags DD_TRACE_SPAN_TAGS, DD_TRACE_GLOBAL_TAGS in deployment file
Changed pattern in logback.xml
Added logs config in deployment file
Added env tags in deployment file
After doing above configurations I am able to log stdout/stderr logs where as I wanted to log application logs in datadog UI
If someone has done this please let me know what am I missing here.
If required, I can share the configurations as well. Thanks in advance
When installing Datadog in your K8s Cluster, you install a Node Logging Agent as a Daemonset with various volume mounts on the hosting nodes. Among other things, this gives Datadog access to the Pod logs at /var/log/pods and the container logs at /var/lib/docker/containers.
Kubernetes and the underlying Docker engine will only include output from stdout and stderror in those two locations (see here for more information). Everything that is written by containers to log files residing inside the containers, will be invisible to K8s, unless more configuration is applied to extract that data, e.g. by applying the side care container pattern.
So, to get things working in your setup, configure logback to log to stdout rather than /var/app/logs/myapp.log
Also, if you don't use APM there is no need to instrument your code with the datadog.jar and do all that tracing setup (setting up ports etc).

How to send json logs through fluentd to stackdriver

I have docker containers writing logs in json format. When they run on GKE, the logs are shown in StackDriver fine, but when I run the same containers on some VM with kubernetes (not GKE) and use fluentd to route the logs to StackDriver, the log messages arrive escaped and under "log" key.
Example: {"stream":"stdout","log":"{\"time\":\"2019-07-25T09:55:18.2393210Z\", ....
How can I configure fluentd to get the logs in the same format as on GKE (without "log": and unescaped)?
There are few things to consider:
You can configure fluentd's log format with this guide.
You can try some reverse engineering. Fluentd config used by GKE can by studied at following path on fluend Pod: /etc/google-fluentd/config.d/containers.input.conf
You can directly check the GKE config in a ConfigMap called fluentd-gcp-config-v1.2.5. There is some useful information regarding how to config fluentd as non-managed. More details here.
Please let me know if that helped.

Forwarding logs from kubernetes to splunk

I'm pretty much new to Kubernetes and don't have hands-on experience on it.
My team is facing issue regarding the log format pushed by kubernetes to splunk.
Application is pushing log to stdout in this format
{"logname" : "app-log", "level" : "INFO"}
Splunk eventually get this format (splunkforwarder is used)
{
"log" : "{\"logname\": \"app-log\", \"level\": \"INFO \"}",
"stream" : "stdout",
"time" : "2018-06-01T23:33:26.556356926Z"
}
This format kind of make things harder in Splunk to query based on properties.
Is there any options in Kubernetes to forward raw logs from app rather than grouping into another json ?
I came across this post in Splunk, but the configuration is done on Splunk side
Please let me know if we have any option from Kubernetes side to send raw logs from application
Kubernetes architecture provides three ways to gather logs:
1. Use a node-level logging agent that runs on every node.
You can implement cluster-level logging by including a node-level logging agent on each node. The logging agent is a dedicated tool that exposes logs or pushes logs to a backend. Commonly, the logging agent is a container that has access to a directory with log files from all of the application containers on that node.
The logs format depends on Docker settings. You need to set up log-driver parameter in /etc/docker/daemon.json on every node.
For example,
{
"log-driver": "syslog"
}
or
{
"log-driver": "json-file"
}
none - no logs are available for the container and docker logs does not
return any output.
json-file - the logs are formatted as JSON. The
default logging driver for Docker.
syslog - writes logging messages to
the syslog facility.
For more options, check the link
2. Include a dedicated sidecar container for logging in an application pod.
You can use a sidecar container in one of the following ways:
The sidecar container streams application logs to its own stdout.
The sidecar container runs a logging agent, which is configured to pick up logs from an application container.
By having your sidecar containers stream to their own stdout and stderr streams, you can take advantage of the kubelet and the logging agent that already run on each node. The sidecar containers read logs from a file, a socket, or the journald. Each individual sidecar container prints log to its own stdout or stderr stream.
3. Push logs directly to a backend from within an application.
You can implement cluster-level logging by exposing or pushing logs directly from every application.
For more information, you can check official documentation of Kubernetes
This week we had the same issue.
Using splunk forwarder DaemonSet
installing https://splunkbase.splunk.com/app/3743/ this plugin on splunk will solve your issue.
Just want to update with the solution what we tried, this worked for our log structure
SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*?)\\n","stream.*/\1/g
SEDCMD-2_unescapequotes = s/\\"/"/g
BREAK_ONLY_BEFORE={"logname":

K8S EFK (especially Fluentd) daemonset for docker journald logging driver

Question
Are there known available Fluentd daemonset for journald docker logging driver so that I can send K8S pod logs to Elasticsearch?
Background
As in add support to log in kubeadm, the default logging driver for K8S installed by kubeadm is journald.
the community is collectively moving away from files on disk at every place possible in general, and this would unfortunately be a step backwards. ...
You can edit you /etc/docker/daemon.json to set its default log to json files and set a max size and max files to take care of the log rotation. After that, logs wont be written to journald and you will be able to send your log files to ES.
However, the K8S EFK addon and Fluentd K8S or Aggregated Logging in Tectonic still expect to look for files in /var/log/containers in the host, if I understood correctly.
It looks Alternative fluentd docker image designed as a drop-in replacement for the fluentd-es-image looks to be adopting journald driver. However could not make it run the pods.
docker log driver journald send docker logs to systemd-journald.service
so, we need to make systemd-journald persistent save to /var/log/journal
edit /etc/systemd/journald.conf:
...
[Journal]
Storage=persistent
#Compress=yes
...
then restart to apply changes:
systemctl restart systemd-journald
ls -l /var/log/journal
as /var/log has been mounted into fluentd pod, its all done, restart fluentd pod it works for me #202104.
by the way, i am using fluentd yaml from:
https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/fluentd-daemonset-elasticsearch-rbac.yaml
and the env FLUENTD_SYSTEMD_CONF value should not be disable