Store Winston js log files on GKE - kubernetes

I'm using winston js for my node js app logging. I'm deploying my dockerized app on GKE and want to store my files outside the container. Where should I store those files and what path should be mounted.
I'm really new to the kubernetes volumes and can't find the right tutorial to follow.

There are multiple options to do or follow, now first thing why you
want store logs in files ?
You can just push the logs to the stack driver service of GCP and your logs will be stored over there, in that case, no need to use any volume or extra configuration required.
Stack driver logging is managed service you can search the logs and push the logs from all the containers. In this way, your service will be stateless and you don't need to configure the volume for your deployment. Since container or POD is stateless running inside the cluster you can easily scale the application also.
Still, if you are planning to use the volume there are plenty of option with below description:
Use Node volume :
In this container will create the logs files inside Node's volume on which it's running.
Example : https://kubernetes.io/docs/concepts/storage/volumes/#hostpath
Cons :
Logs will get removed as soon as Node is auto-scaled or removed from GKE cluster during maintenance or so.
Container can schedule to any node any time, or seq of logs might create the issue.
Use PVC disk :
If you will use the Disk to store your logs for long persistent time
like 30-40 days it will work.
You create volume which will be used by POD in K8s and POD can use
this volume to create file and store and it won't get deleted from
that disk unless you do it.
Example : https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes
Cons :
Only a single POD (Replica) can connect with this if you are using
the access mode ReadWriteOnce, this can create issue if you are
planning to do autoscaling or want to many replicas.
NFS & EFS
You have to use the ReadWriteMany access Mode if you are planning to store all replicas or PODs logs in K8s.
For ReadWriteMany option you can use any NFS service or GCP EFS service.
in this case all your PODs will write to the Single Volume NFS or EFS system and logs will be saved there, but the extra configuration is required.
Example : https://medium.com/#Sushil_Kumar/readwritemany-persistent-volumes-in-google-kubernetes-engine-a0b93e203180
Extra :
The best option is to push logs to stack driver without during many configurations of logs and you can manage retention period over there. Just start pushing logs from the application container and you can scale replica seamlessly.

Related

Why should I use Kubernetes Persistent Volumes instead of Volumes

To use storage inside Kubernetes PODs I can use volumes and persistent volumes. While the volumes like emptyDir are ephemeral, I could use hostPath and many other cloud based volume plugins which would provide a persistent solution in volumes itself.
In that case why should I be using Persistent Volume then?
It is very important to understand the main differences between Volumes and PersistentVolumes. Both Volumes and PersistentVolumes are Kubernetes resources which provides an abstraction of a data storage facility.
Volumes: let your pod write to a filesystem that exists as long as the pod exists. They also let you share data between containers in the same pod but data in that volume will be destroyed when the pod is restarted. Volume decouples the storage from the Container. Its lifecycle is coupled to a pod.
PersistentVolumes: serves as a long-term storage in your Kubernetes cluster. They exist beyond containers, pods, and nodes. A pod uses a persistent volume claim to to get read and write access to the persistent volume. PersistentVolume decouples the storage from the Pod. Its lifecycle is independent. It enables safe pod restarts and sharing data between pods.
When it comes to hostPath:
A hostPath volume mounts a file or directory from the host node's
filesystem into your Pod.
hostPath has its usage scenarios but in general it might not recommended due to several reasons:
Pods with identical configuration (such as created from a PodTemplate) may behave differently on different nodes due to different files on the nodes
The files or directories created on the underlying hosts are only writable by root. You either need to run your process as root in a privileged Container or modify the file permissions on the host to be able to write to a hostPath volume
You don't always directly control which node your pods will run on, so you're not guaranteed that the pod will actually be scheduled on the node that has the data volume.
If a node goes down you need the pod to be scheduled on other node where your locally provisioned volume will not be available.
The hostPath would be good if for example you would like to use it for log collector running in a DaemonSet.
I recommend the Kubernetes Volumes Guide as a nice supplement to this topic.
PersistentVoluemes is cluster-wide storage and allows you to manage the storage more centrally.
When you configure a volume (either using hostPath or any of the cloud-based volume plugins) then you need to do this configuration within the POD definition file. Every configuration information, required to configure storage for the volume, goes within the POD definition file.
When you have a large environment with a lot of users and a large number of PODs then users will have to configure storage every time for each POD they deploy. Whatever storage solution is used, the user who deploys the POD will have to configure that storage on all of his/her POD definition files. If a change needs to be made then the user will have to make this change on all of his/her PODs. After a certain scale, this is not the most optimal way to manage storage.
Instead, you would like to manage this centrally. You would like to manage the storage in such a way that an Administrator can create a large pool of storage and users can carve out a part of this storage as required, and this is exactly what you can do using PersistentVolumes and PersistentVolumeClaims.
Use PersistentVolumes when you need to set up a database like MongoDB, Redis, Postgres & MySQL. Because it's long-term storage and not deeply coupled with your pods! Perfect for database applications. Because they will not die with the pods.
Avoid Volumes when you need long-term storage. Because they will die with the pods!
In my case, when I have to store something, I will always go for persistent volumes!

Kubernetes Persistent Volume for Pod in Production

I executed a scenario where I deployed Microsoft SQL Database on my K8s cluster using PV and PVC. It work well but I see some strange behaviour. I created PV but it is only visible on one node and not on other workers nodes. What am I missing here, any inputs please?
Background:
Server 1 - Master
Server 2 - Worker
Server 3 - Worker
Server 4 - Worker
Pod : "MyDb" is running on Server (Node) 4 without any replica set. I am guessing because my POD is running on server-4, PV got created on server four when created POD and refer PVC (claim) in it.
Please let me know your thought on this issue or share your inputs about mounting shared disk in production cluster.
Those who want to deploy SQL DB on K8s cluster, can refer blog posted by Philips. Link below,
https://www.phillipsj.net/posts/sql-server-on-linux-on-kubernetes-part-1/ (without PV)
https://www.phillipsj.net/posts/sql-server-on-linux-on-kubernetes-part-2/ (with PV and Claim)
Regards,
Farooq
Please see below my findings of my original problem statement.
Problem: POD for SQL Server was created. At runtime K8s created this pod on server-4 hence created PV on server-4. However, on other node PV path wasn't created (/tmp/sqldata_.
I shutdown server-4 node and run command for deleting SQL pod (no replica was used so initially).
Status of POD changed to "Terminating" POD
Nothing happened for a while.
I restarted server-4 and noticed POD got deleted immediately.
Next Step:
- I stopped server-4 again and created same pod.
- POD was created on server-3 node at runtime and I see PV (/tmp/sqldata) was created as well on server-3. However, all my data (example samples tables) are was lost. It is fresh new PV on server 3 now.
I am assuming PV would be mounted volume of external storage and not storage/disk from any node in cluster.
I am guessing because my POD is running on server-4, PV got created on server four when created POD and refer PVC (claim) in it.
This is more or less correct and you should be able to verify this by simply deleting the Pod and recreating it (since you say you do not have a ReplicaSet doing that for you). The PersistentVolume will then be visible on the node where the Pod is scheduled to.
Edit: The above assumes that you are using an external storage provider such as NFS or AWS EBS (see possible storage providers for Kubernetes). With HostPath the above does NOT apply and a PV will be created locally on a node (and will not be mounted to another node).
There is no reason to mount the PersistentVolume also to the other nodes. Imagine having hundreds of nodes, would you want to mount your PersistentVolume to all of them, while your Pod is just running on one?
You are also asking about "shared" disks. The PersistentVolume created in the blog post you linked is using ReadWriteMany, so you actually can start multiple Pods accessing the same volume (given that your storage supports that as well). But your software (a database in your case) needs to support having multiple processes accessing the same data.
Especially when considering databases, you should also look into StatefulSets, as this basically allows you to define Pods that are always using the same storage, which can be very interesting for databases. Wherever you should run or not run databases on Kubernetes is a whole different topic...

How do I mount data into persisted storage on Kubernetes and share the storage amongst multiple pods?

I am new at Kubernetes and am trying to understand the most efficient and secure way to handle sensitive persisted data that interacts with a k8 pod. I have the following requirements when I start a pod in a k8s cluster:
The pod should have persisted storage.
Data inside the pod should be persistent even if the pod crashes or restarts.
I should be able to easily add or remove data from hostPath into the pod. (Not sure if this is feasible since I do not know how the data will behave if the pod starts on a new node in a multi node environment. Do all nodes have access to the data on the same hostPath?)
Currently, I have been using StatefulSets with a persistent volume claim on GKE. The image that I am using has a couple of constraints as follows:
I have to mount a configuration file into the pod before it starts. (I am currently using configmaps to pass the configuration file)
The pod that comes up, creates its own TLS certificates which I need to pass to other pods. (Currently I do not have a process in place to do this and thus have been manually copy pasting these certificates into other pods)
So, how do I maintain a common persisted storage that handles sensitive data between multiple pods and how do I add pre-configured data to this storage? Any guidance or suggestions are appreciated.
I believe this documentation on creating a persistent disk with multiple readers [1] is what you are looking for. you will however only be able to have the pods read from the disk since GCP does not support "WRITEMANY".
Regarding hostpaths, the mount point is on the pod the volume is a directory on the node. I believe the hostpath is confined to individual nodes.
[1] https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/readonlymany-disks
[2] https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

minIO on Kubernetes : Best way to back it up

I have tried to deploy so far in the below 2 ways:
1. Deploy it as a statefulset on Kubernetes and using persistent
volume of nfs as its storage . But then I learnt that we shouldn't
be running minio backed by NAS as erasure code should not be run on NAS data.
2. Deploy it as Daemonset using local volumes. This time I attached
separate disks to my nodes and labelled them such
that minio is scheduled to run on these nodes. The disks are mounted on
/data/minio on each of the nodes
But now everyday, the nodes are coming under disk pressure and minio pods are
getting evicted. When I check the kubelet logs:
Aug 13 21:05:45 staging-node2 kubelet[2188]: I0813 21:05:45.968179 2188 kubelet_pods.go:1073] Killing unwanted pod "minio-kjrkc"
Aug 13 21:05:45 staging-node2 kubelet[2188]: I0813 21:05:45.975372 2188 kuberuntime_container.go:559] Killing container "docker://6da1247718f8e6c92399e231f8c31ff1c510737c658ac2aca87c1659aa6b51cc" with 30 second grace period
It tries to kill the pods but the container never dies. Even if minio gets TERMINATED signal, container is still up.
What other options are left for an on-prem minio installations?
Is using local storage somehow not recommended or am I using it incorrectly ?
Any idea if I have to explicitly configure any pre-hook for minio to accept the terminate signal ?
Right approach to deploy MinIO on Kubernetes is using local volumes on multiple distributed nodes using a StatefulSet. You can do this via Yaml files or Helm Chart or via our MinIO Operator. Docs are available here https://github.com/minio/minio/tree/master/docs/orchestration/kubernetes
What other options are left for an on-prem minio installations?
A StatefulSet with atleast 4 Pods and storage via hostPath volumes is generally the right way to deploy. Depending on the use case and existing infrastructure there may be other relevant approaches too.
Is using local storage somehow not recommended or am I using it incorrectly ?
Local storage is recommended. I'll need more info to see what exactly is going wrong here, but if you have dedicated local storage drives, you should be fine.
Any idea if I have to explicitly configure any pre-hook for minio to accept the terminate signal ?
MinIO handles signals to terminate etc. There is no pre-hook needed as such.
Hope this helps, please join us on our Slack channel # https://slack.min.io for detailed discussion etc.

Apache Kafka - Volume Mapping for Message Log files in Kubernetes (K8s)

When we deploy apache kafka on Linux/Windows, we have log.dirs and broker.id properties. on bare metal, the files are saved on the individual host instances. However, when deployed via K8s on public cloud - there must be some form of volume mounting to make sure that the transaction log fils are saved somewhere?
Has anyone done this on K8s? I am not referring to Confluent (because it's a paid subscription).
As far as I understand you are just asking how to deal with storage in Kubernetes.
Here is a great clip that talks about Kubernetes Storage that I would recommend to You.
In Kubernetes you are using Volumes
On-disk files in a Container are ephemeral, which presents some problems for non-trivial applications when running in Containers. First, when a Container crashes, kubelet will restart it, but the files will be lost - the Container starts with a clean state. Second, when running Containers together in a Pod it is often necessary to share files between those Containers. The Kubernetes Volume abstraction solves both of these problems.
There is many types of Volumes, some are cloud specific like awsElasticBlockStore, gcePersistentDisk, azureDisk and azureFile.
There are also other types like glusterfs, iscsi, nfs and many more that are listed here.
You can also use Persistent Volumes which provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed:
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., can be mounted once read/write or many times read-only).
Here is a link to Portworx Kafka Kubernetes in production: How to Run HA Kafka on Amazon EKS, GKE and AKS which might be handy for you as well.
And if you would be interested in performance then Kubernetes Storage Performance Comparison is a great 10min read.
I hope those materials will help you understand Kubernetes storage.