Is pre disk addition is a must before deploying OpenEBS? - kubernetes

I have a 3 node k8s cluster and having a remote storage box with additional disks connected to it. I want to utilize these disks. So is this use case supported on OpenEBS? Also, do I have to attach the disks to Node before deploying OpenEBS? Is this a prerequisites?

Sure. It's supported and you need the disk attached when you setup OpenEBS as your block storage.
After you set it up, essentially you can create volumes (pvcs, pvs) for Kubernetes and mount them on your pods for consumption.
You can setup OpenEBS on Kubernetes cluster where you run your workloads either using helm or kubectl

Yes OpenEBS support storage with additional disks connected. With 0.7 it has a feature NDM (Node Disk Manager) which would monitor the disks attached to the nodes. Once the disks are attached you can create a pool on top of it and use the same. For more details, document link

Related

minio for mariadb in kubernetes

I'm running a k3s single node cluster and have the k3s local-path-provisioner as storage. As I want to be able to add nodes in the future, I looked at minio to use on top of the local-path as storage. But I'm not sure if it's the right choice, cause I my workloads primarily use mariadb for data and I read, that an s3 compatible bucket isn't the best for database applications.
I hope you can help me figure this out.
If you don't want to use object storage then here are your options for running a local storage provisioner:
GlusterFS StorageClass
Doesn't have lot of documentation on how to set it up. But if you know your way around GlusterFS It'll be a good option.
local-path-provisioner
I
t provides a way for the Kubernetes users to utilize the local storage in each node
OpenEBS -> has a local volume storage engine but I think this is not designed to work on a shared volume mount and it end up tying a pod to a specific node since the data "doesn't exist" on the other nodes.
longhorn [recommened]
It creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes.
rook
Rook is a storage operators for Kubernetes, It supports multiple storage backends. Don't use the NFS one tho cause we hit a wall when using it with our DBs.

Are Pods forced to run on nodes where their persistent volumes exist?

I'm teaching myself Kubernetes with a 5 Rpi cluster, and I'm a bit confused by the way Kubernetes treats Persistent Volumes with respect to Pod Scheduling.
I have 4 worker nodes using ext4 formatted 64GB micro SD cards. It's not going to give GCP or AWS a run for their money, but it's a side project.
Let's say I create a Persistent volume Claim requesting 10GB of storage on worker1, and I deploy a service which relies on this PVC, is that service then forced to be scheduled on worker1?
Should I be looking into distributed file systems like Ceph or Hdfs so that Pods aren't restricted to being scheduled on a particular node?
Sorry if this seems like a stupid question, I'm self taught and still trying to figure this stuff out! (Feel free to improve my tl;dr doc for kubernetes with a pull req)
just some examples, as already mentioned it depends on your storage system, as i see you use the local storage option
Local Storage:
yes the pod needs to be run on the same machine where the pv is located (your case)
ISCSI/Trident San:
no, the node will mount the iscsi block device where the pod will be scheduled
(as mentioned already volume binding mode is an important keyword, its possible you need to set this to 'WaitForFirstConsumer')
NFS/Trident Nas:
no, its nfs, mountable from everywhere if you can access and auth against it
VMWare VMDK's:
no, same as iscsi, the node which gets the pod scheduled mounts the vmdk from the datastore
ceph/rook.io:
no, you get 3 options for storage, file, block an object storage, every type is distributed so you can schedule a pod on every node.
also ceph is the ideal system for carrying a distributed software defined storage on commodity hardware, what i can recommend is https://rook.io/ basically an opensource ceph on 'container-steroids'
Let's say I create a Persistent volume Claim requesting 10GB of storage on worker1, and I deploy a service which relies on this PVC, is that service then forced to be scheduled on worker1?
This is a good question. How this works depends on your storage system. The StorageClass defined for your Persistent Volume Claim contains information about Volume Binding Mode. It is common to use dynamic provisioning volumes, so that the volume is first allocated when a user/consumer/Pod is scheduled. And typically this volume does not exist on the local Node but remote in the same data center. Kubernetes also has support for Local Persistent Volumes that are physical volumes located on the same Node, but they are typically more expensive and used when you need high disk performance and volume.

Opensource Storage Options for Kubernetes Cluster running on bare metal

I have a 3 node cluster running on bare metal. it was setup using kubeadm.
Each node in the cluster has 100GB disk space, total add up to 300GB.
I would like to utilize the 300GB disk space available on them to run stateful pods like mysql, postgresql, mongodb, cassandra etc. what are the different opensource options available to create persistent volumes.
I still havent used kuberentes v1.14 which offers local persistent volumes out of the box. It would be one of the option
second option is to run NFS server on each node and utilize the nfs share from the respective machine to create PV's.
Apart from these what other options can be looked at. please suggest

Apache Kafka - Volume Mapping for Message Log files in Kubernetes (K8s)

When we deploy apache kafka on Linux/Windows, we have log.dirs and broker.id properties. on bare metal, the files are saved on the individual host instances. However, when deployed via K8s on public cloud - there must be some form of volume mounting to make sure that the transaction log fils are saved somewhere?
Has anyone done this on K8s? I am not referring to Confluent (because it's a paid subscription).
As far as I understand you are just asking how to deal with storage in Kubernetes.
Here is a great clip that talks about Kubernetes Storage that I would recommend to You.
In Kubernetes you are using Volumes
On-disk files in a Container are ephemeral, which presents some problems for non-trivial applications when running in Containers. First, when a Container crashes, kubelet will restart it, but the files will be lost - the Container starts with a clean state. Second, when running Containers together in a Pod it is often necessary to share files between those Containers. The Kubernetes Volume abstraction solves both of these problems.
There is many types of Volumes, some are cloud specific like awsElasticBlockStore, gcePersistentDisk, azureDisk and azureFile.
There are also other types like glusterfs, iscsi, nfs and many more that are listed here.
You can also use Persistent Volumes which provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed:
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., can be mounted once read/write or many times read-only).
Here is a link to Portworx Kafka Kubernetes in production: How to Run HA Kafka on Amazon EKS, GKE and AKS which might be handy for you as well.
And if you would be interested in performance then Kubernetes Storage Performance Comparison is a great 10min read.
I hope those materials will help you understand Kubernetes storage.

GKE How many Persistent Disks could be attached to a single node?

Is it possible to attach ~30 persistent disks to single k8s node (e.g. n1-standard-4)?
According to the documentation 2-4 core node can support up to 64 attached disks in Beta: Link.
Is it supported by GKE? Is there any limit in GKE Kubernetes?
GKE has the same limitation as vanilla Kubernetes on GCP per se. The Kubernetes limits for the largest public cloud providers are documented here
You can also change those limits using the KUBE_MAX_PD_VOLS on the kube-scheduler (After restarting). Unfortunately, you won't be able to change this on GKE yet, because GKE doesn't give you access to the master(s) configuration yet.
Also documented here is Dynamic Volume Limits introduced in Kubernetes 1.11 and currently in Beta.
I believe you self-answered your first question, the n1-standard-4 VM has 4 vCPUs and per the link that you provided you can attach up to 64 disks. So yes, you should be able to attach 30 persistent disks, a PVC/PV in the GCE storage class maps to GCP VM disk.