I have 3 virtual servers with a lot of storage - can I use the storage in a reliable way somehow? - kubernetes

Here's the deal:
I have 3 virtual servers (K8S worker nodes) with 160 GB of virtual SSD each. They form together a private network and I would like to utilize the space for persistent volume claims.
Preferably in a way similar to RAID 1 or RAID 5.
Installing rook.io is not possible because it's not bare metal.
Is there some way to have "RAID over NFS" or something in K8S? I'm running RKE.

Related

Rightsizing Kubernetes Nodes | How much cost we save when we switch from VMs to containers

We are running 4 different micro-services on 4 different ec2 autoscaling groups:
service-1 - vcpu:4, RAM:32 GB, VM count:8
service-2 - vcpu:4, RAM:32 GB, VM count:8
service-3 - vcpu:4, RAM:32 GB, VM count:8
service-4 - vcpu:4, RAM:32 GB, VM count:16
We are planning to migrate this workload on EKS (in containers)
We need help in deciding the right node configuration (in EKS) to start with.
We can start with a small machine vcpu:4, RAM:32 GB, but will not get any cost saving as each container will need a separate vm.
We can use a large machine vcpu:16, RAM: 128 GB, but when these machines scale out, scaled out machine will be large and thus can be underutiliized.
Or we can go with a Medium machine like vcpu: 8, RAM:64 GB.
Other than this recommendation, we were also evaluating the cost saving of moving to containers.
As per our understanding, every VM machine comes with following overhead
Overhead of running hypervisor/virtualisation
Overhead of running separate Operating system
Note: One large VM vs many small VMs cost the same on public cloud as cost is based on number of vCPUs + RAM.
Hypervisor/virtualization cost is only valid if we are running on-prem, so no need to consider this.
On the 2nd point, how much resources a typical linux machine can take to run a OS? If we provision a small machine (vcpu:2, RAM:4GB), an approximate cpu usage is 0.2% and memory consumption (other than user space is 500Mb).
So, running large instances (count:5 instances in comparison to small instances count:40) can save 35 times of this cpu and RAM, which does not seem significant.
You are unlikely to see any cost savings in resources when you move to containers in EKS from applications running directly on VM's.
A Linux Container is just an isolated Linux process with specified resource limits, it is no different from a normal process when it comes to resource consumption. EKS still uses virtual machines to provide compute to the cluster, so you will still be running processes on a VM, regardless of containerization or not and from a resource point of view it will be equal. (See this answer for a more detailed comparison of VM's and containers)
When you add Kubernetes to the mix you are actually adding more overhead compared to running directly on VM's. The Kubernetes control plane runs on a set of dedicated VM's. In EKS those are fully managed in a PaaS, but Amazon charges a small hourly fee for each cluster.
In addition to the dedicated control plane nodes, each worker node in the cluster need a set of programs (system pods) to function properly (kube-proxy, kubelet etc.) and you may also define containers that must run on each node (daemon sets), like log collectors and security agents.
When it comes to sizing the nodes you need to find a balance between scaling and cost optimization.
The larger the worker node is the smaller the relative overhead of system pods and daemon sets become. In theory a worker node large enough to accommodate all your containers would maximize resources consumed by your applications compared to supporting applications on the node.
The smaller the worker nodes are the smaller the horizontal scaling steps can be, which is likely to reduce waste when scaling. It also provides better resilience as a node failure will impact fewer containers.
I tend to prefer nodes that are small so that scaling can be handled efficiently. They should be slightly larger than what is required from the largest containers, so that system pods and daemon sets also can fit.

Mongodb on the cloud

I'm preparing my production environment on the Hetzner cloud, but I have some doubts (I'm more a developer than a devops).
I will get 3 servers for the replicaset with 8 core, 32 Gb ram and 240 gb ssd. I'm a bit worried about the size of the ssd the server comes with and Hetzner has the possibility to create volumes to be attached to the servers. Since mongodb uses a single folder for the db data, I was wondering how can I use the 240 gb that comes with the server in combination with external volumes. At the beginning I can use the 240 gb, but then I will have to move the data folder to a volume when it reaches capacity. Im fine with this, but it looks to me that when I will move to volumes, this 240gb will not be used anymore (yes I can use them to save the mongo journaling as they suggest to store it in a separate partition).
So, my noob question is, how can I use both the disk that comes with the server and the external volumes?
Thank you

Should I use SSD or HDD as local disks for kubernetes cluster?

Is it worth using SSD as boot disk? I'm not planning to access local disks within pods.
Also, GCP by default creates 100GB disk. If I use 20GB disk, will it cripple the cluster or it's OK to use smaller sized disks?
Why one or the other?. Kubernetes (Google Conainer Engine) is mainly Memory and CPU intensive unless your applications need a huge throughput on the hard drives. If you want to save money you can create tags on the nodes with HDD and use the node-affinity to tweak which pods goes where so you can have few nodes with SSD and target them with the affinity tags.
I would always recommend SSD considering the small difference in price and large difference in performance. Even if it just speeds up the deployment/upgrade of containers.
Reducing the disk size to what is required for running your PODs should save you more. I cannot give a general recommendation for disk size since it depends on the OS you are using and how many PODs you will end up on each node as well as how big each POD is going to be. To give an example: When I run coreOS based images with staging deployments for nginx, php and some application servers I can reduce the disk size to 10gb with ample free room (both for master and worker nodes). On the extreme side - If I run self-contained golang application containers without storage need, each POD will only require a few MB space.

Getting access to a SAN disk / LUN from a virtual machine. Is it possible?

Resources:
node1: Physical cluster node 1.
node2: Physical cluster node 2.
cluster1: Cluster containing node1 and node2 used to host virtual machines.
san1: Dell md3200 highly available storage device (SAN).
lun1: A lun dedicated to file server storage located on san1.
driveZ: A hard drive currently a resource on node1 that is 100GB and has the
drive letter Z:\. This drive letter is lun1 that resides on san1.
virtual1: A virtual server used as a file server only.
Synopsis / Goals:
I have two nodes/servers on my network. Theses two nodes (node1 and node2) are part of a cluster (cluster1) that is used for hosting all my virtual machines. There is a SAN involved (san1) that has many LUNs created on it one of which (lun1) will be used to store all data dedicated to a virtual machine (virtual1). Eventually lun1 is created, given the name "storage" and strictly used for the virtual machine "virtual1" to store and access data.
What I have currently in place:
- I currently have created the SAN (san1), created a disk group with the
virtual disk (storage), and assigned a LUN (lun1) to it.
- I have set up two physical servers that are connected to the SAN via SAS
cables (multi paths).
- I have set up the clustering feature on those two servers and have hyper-v
role installed on each as well.
- I have created a cluster (cluster1) with server members node1 and node2.
- I have created a virtual server (virtual1) and made it highly available
on the cluster (cluster1).
Question:
Is it possible to have lun1 (drive z) brought up and accessed by virtual1?
What I have tried:
I had the lun1 aka driveZ showing up in node1's disk management. I then added it as a resource to the cluster storage area. I tried to do two different things. (1) I tried to add it as a Cluster Shared Volume, shortly after I realized that only the cluster members could see/access it and not the virtual machines even though they were created as a service under in the cluster. (2) I tried to move the resource (driveZ) to the virtual machine (virtual1) within cluster1. After doing that I went into the virtual machine settings and added the drive as a SCSI drive (using lun1 # 100GB) and refreshed the Disk Management on the virtual machine (virtual1). The drive showed up and allowed me to assign a drive letter, then asked me if I wanted to format it... What about all my data thats on it?? Was that a bust? Anyway, thats where I'm at right now... Ideas?
Thoughts:
Just so I'm clear, all of this is for testing atm... Actual sizes of resources in production greatly differ. I was thinking about adding the driveZ (lun1) as a Cluster Shared Volume, and then add a new Hyper-V virtual SCSI drive (say 50G so later I can try to expand to 100G, the full size of the physical/SAN drive) to my VM. Storing the fixed VHD (Virtual Hard Disk) inside the Cluster Shared Volume "driveZ". I'm testing it out now... But I have concerns... 1) What happens when I try to create a really large VHD (around 7TB)? 2) Can the fixed disk VHD be expanded in any way? I plan on making my new SAN virtual disk larger than 7TB in the future... Currently its going to stay at 7TB but that will expand at some point...
Figured it out!
The correct way to do it is...
Setup a SAN, create a disk group with two virtual disks, and assigned LUNs to them.
Setup your 2 physical servers with Win Server 2008 R2, connect them both to the SAN.
Add the Failover Cluster feature, and the Hyper-V role to both servers.
For the two drives (from the SAN), bring them online and initialize them both. Create a simple volume on each drive if you wish, even format them if you want.
Create a cluster, add 1 of the virtual disks from the SAN as a Cluster Shared Volume. This will be used to store the virtual machines on.
Create a virtual machine and store it on the CSV ex: C:\ClusterStorage\Volume1\, then power it up.
The second drive you need to take offline. This should just be a drive on the host server. It has to be offline! When you right click and choose offline, go ahead and right click then go to properties. On that page look for the LUN number and write it down.
Open up the VM settings go down to Scsi controller and add a drive. Choose physical drive and choose the correct LUN number. Hit OK and it should show up in the VM Storage Manager.
As a helpful tool check these pages out...
Configuring Disks and Storage
Hyper-V Clustering Video 1
Hyper-V Clustering Video 2
Hyper-V Clustering Video 3
Hyper-V Clustering Video 4
Hyper-V Clustering Video 5

Do you need to run RAID 10 on Mongo when using Provisioned IOPS on Amazon EBS?

I'm trying to setup a production mongo system on Amazon to use as a datastore for a realtime metrics system,
I initially used the MongoDB AMIs[1] in the Marketplace, but I'm confused in that there is only one data EBS. I've read that Mongo recommends RAID 10 on EBS storage (8 EBS on each server). Additionally, I've read that the bare minimum for production is a primary/secondary with an arbiter. Is RAID 10 still the recommended setup, or is one provisioned IOPS EBS sufficient?
Please Advise. We are a small shop, so what is the bare minimum we can get away with and still be reasonably safe?
[1] MongoDB 2.4 with 1000 IOPS - data: 200 GB # 1000 IOPS, journal: 25 GB # 250 IOPS, log: 10 GB # 100 IOPS
So, I just got off of a call with an Amazon System Engineer, and he had some interesting insights related to this question.
First off, if you are going to use RAID, he said to simply do striping, as the EBS blocks were mirrored behind the scenes anyway, so raid 10 seemed like overkill to him.
Standard EBS volumes tend to handle spiky traffic well (it may be able to handle 1K-2K iops for a few seconds), however eventually it will tail off to an average of 100 iops. One suggestion was to use many small EBS volumes and stripe them to get better iops throughput.
Some of his customers use just the ephemeral storage on the EC2 images, but then have multiple (3-5) nodes in the availability set. The ephemeral storage is the storage on the physical machine. Apparently, if you use the EC2 instance with the SSD storage, you can get up to 20K iops.
Some customers will do a huge EC2 image w/ssd for the master, then do a smaller EC2 w/ EBS for the secondary. The primary machine is performant, but the failover is available but has degraded performance.
make sure you check 'EBS Optimized' when you spin up an instance. That means you have a dedicated channel to the EBS storage (of any kind) instead of sharing the NIC.
Important! Provisioned IOPS EBS is expensive, and the bill does not shut off when you shut down the EC2 instances they are attached to. (this sucks while you are testing) His advice was to take a snapshot of the EBS volumes, then delete them. When you need them again, just create new provisioned IOPS EBS volumes, restore the snapshot, then reconfigure your EC2 instances to attache the new storage. (it's more work than it should be, but it's worth it not to get sucker punched with the IOPS bill.
I've got the same question. Both Amazon and Mongodb try to market a lot on provisioned IOPs chewing over its advantages over a standard EBS volume. We run prod instances on m2.4xlarge aws instances with 1 primary and 2 secondaries setup per service. In the highest utilized service cluster, apart from a few slow queries the monitoring charts do not reveal any drop on performance at all. Page faults are rare occurrences and that too between 0.0001 and 0.0004 faults once or twice a day. Background flushes are in milliseconds and locks and queues are so far at manageable levels. I/O waits on the Primary node at any time ranges between 0 to 2 %, mostly less than 1 and %idle steadily stays above 90% mark. Do I still need to consider provisioned IOPs given we've a budget still to improve any potential performance drag? Any guidance will be appreciated.