How can I get log rotation working inside a kubernetes container/pod? - kubernetes

Our setup:
We are using kubernetes in GCP.
We have pods that write logs to a shared volume, with a sidecar container that sucks up our logs for our logging system.
We cannot just use stdout instead for this process.
Some of these pods are long lived and are filling up disk space because of no log rotation.
Question:
What is the easiest way to prevent the disk space from filling up here (without scheduling pod restarts)?
I have been attempting to install logrotate using: RUN apt-get install -y logrotate in our Dockerfile and placing a logrotate config file in /etc/logrotate.d/dynamicproxy but it doesnt seem to get run. /var/lib/logrotate/status never gets generated.
I feel like I am barking up the wrong tree or missing something integral to getting this working. Any help would be appreciated.

We ended up writing our own daemonset to properly collect the logs from the nodes instead of the container level. We then stopped writing to shared volumes from the containers and logged to stdout only.
We used fluentd to the logs around.
https://github.com/splunk/splunk-connect-for-kubernetes/tree/master/helm-chart/splunk-kubernetes-logging

In general, you should write logs to stdout and configure log collection tool like ELK stack. This is the best practice.
However, if you want to run logrotate as a separate process in your container - you may use Supervisor, which serves as a very simple init system and allows you to run as many parallel process in container as you want.
Simple example for using Supervisor for rotating Nginx logs can be found here: https://github.com/misho-kr/docker-appliances/tree/master/nginx-nodejs

If you write to the filesystem the application creating the logs should be responsible for rotation. If you are running a java application with logback or log4j it is simple configuration change. For other languages/frameworks it is usually similar.
If that is not an option you could use a specialized tool to handle the rotation and piping the output to it. One example would be http://cr.yp.to/daemontools/multilog.html
As method of last resort you could investigate to log into a named pipe (FIFO) instead of a real file and have some other process handling the retrieval and writing of the data - including the rotation.

Related

Is possible for a container to send kafka event when finishes?

We just migrated to a kubernetes cluster, I was wondering if it is possible to send a kafka event when a container/pod finishes automatically with the stdout as message. Right now we are using fluentd with elastic search but the output of a pod is used as input for the next one, we need to poll constantly elastic search for when the output is ready and that causes performance issues on overall execution
I'm not sure of your current setup but my first thought would jump to:
Use something such as fluentd or Logstash on it's own pod per node
Configure volume access to Kubernetes log folder /var/log/containers/*
Use the Kafka output for either fluentd or Logstash with file input (tail) on the logging folder
This approach would require the configuration above on each node however but requires minimal configuration of logging locations etc..
It's not something I've personally configured but have considered it for the future.
More info here

how kubernetes deal with file write locker accross multi pods when hostpath Volumes concerned

I got app that logs to file my_log/1.log, and then I use filebeat to collect the logs from the file
Now I use k8s to deploy it into some nodes, and use hostpath type Volumes to mounts my_log file to the local file syetem, /home/my_log, suddenly I found a subtle situation:
what will it happened if more than one pod deployed on this machine, and then they try to write the log at the same time?
I know that in normal situation, multi-processing try to write to a file at the same time, the system will lock the file,so these processes can write one by one, BUT I am not sure will k8s diffirent pods will not share the same lock space, if so, it will be a disaster.
I try to test this and it seems diffirent pods will still share the file lock,the log file seems normal
how kubernetes deal with file write locker accross multi pods when hostpath Volumes concerned
It doesn't.
Operating System and File System are handling that.
As an example let's take syslog. It handles it by opening a socket, setting the socket to server mode, opening a log file in write mode, being notified of packages, parsing the message and finally writing it to the file.
Logs can also be cached, and the process can be limited to 1 thread, so you should not have many pods writing to one file. This could lead to issues like missing logs or lines being cut.
Your application should handle the file locking to push logs, also if you want to have many pods writing logs, you should have a separate log file for each pod.

Logging Kubernetes with an external ELK stack

Is there any documentation out there on sending logs from containers in K8s to an external ELK cluster running on EC2 instances?
We're in the process of trying to Kubernetes set up and I'm trying to figure out how to get the logging to work correctly. We already have an ELK stack setup on EC2 for current versions of the application but most of the documentation out there seems to be referring to ELK as it's deployed to the K8s cluster.
I am also working on the same cause.
First you should know what driver is being used by your docker containers to manage the logs (json driver/ journald etc - read here).
After that you should use some log collector in your architecture to send the logs to the Logstash endpoint. You can use filebeat/fluent bit. They are light weight alternatives to logstash/fluentd respectively. You must use one of them and not directly send your logs to logstash via syslog since these log shippers have a special functionality of enriching your logs with kubernetes metadata of the respective containers.
There might be lot of challenges after that. Parsing log data (multiline logs for example) etc. For an efficient pipeline, it’s better to do most of the work (i.e. extracting the date object from the logs etc) at the log sender side, than using the common logstash for this purpose that might be a bottle-neck.
Note that in case the container logs are not sent to stdout/stderr but written else-where, you might need to run filebeat/fluent-bit as side-car with your containers.
As for the links for documentation are concerned, I myself didn’t find anything documented in a single place on this, but the keywords that I mentioned over, reading about them I got to know many things.
Hope this helps.

Can we consider using containers (and kubernetes) for monolith, stateful web applications?

I'm learning about Containers and Kubernetes and was evaluating if we can move our monolith, stateful appplication to kubernetes?
I was also looking at https://kubernetes.io/blog/2018/03/principles-of-container-app-design/ and "Self-Containment" looks close. We can consider using "storage".
Properties of my application:
1. Runs on a JVM
2. Does not have a database. Saves all its data/content to TAR files on the file-system
3. Should be able to backup and retain state if the container goes down.
In our current scenarios, we deploy the app to a VM and our IT teams generally take snapshots of these VM's as backups and restore them if the app fails or they have to restore to a point where the app was working good. I wanted to avoid this.
Please advice.
You call it as web application, but based on what it does it just a process which writes to file system.
If you move to k8s, write to NFS or persistent storage from pod. If you can only run one instance, then you can't use k8s horizontal scaling.

"Injecting" configuration files at startup

I have a number of legacy services running which read their configuration files from disk and a separate daemon which updates these files as they change in zookeeper (somewhat similar to confd).
For most of these types of configuration we would love to move to a more environment variable like model, where the config is fixed for the lifetime of the pod. We need to keep the outside config files as the source of truth as services are transitioning from the legacy model to kubernetes, however. I'm curious if there is a clean way to do this in kubernetes.
A simplified version of the current model that we are pursuing is:
Create a docker image which has a utility for fetching config files and writing them to disk ones. Then writes a /donepath/done file.
The main image waits until the done file exists. Then allows the normal service startup to progress.
Use an empty dir volume and volume mounts to get the conf from the helper image into the main image.
I keep seeing instances of this problem where I "just" need to get a couple of files into the docker image at startup (to allow per-env/canary/etc variance), and running all of this machinery each time seems like a burden throw on devs. I'm curious if there is a more simplistic way to do this already in kubernetes or on the horizon.
You can use the ADD command in your Dockerfile. It is used as ADD File /path/in/docker. This will allow you to add a complete file quickly to your container. You need to have the file you want to add to the image in the same directory as the Dockerfile when you build the container. You can also add a tar file this way which will be expanded during the build.
Another option is the ENV command in a your Dockerfile. This adds the data as an environment variable.