How to achieve the association of file and memory in K8S? - kubernetes

In k8s, we can use the memory medium(tmpfs instance) to define the emptyDir volume and mount it to pod's container. In the container, we can read and write data according to the file interface.
I want to know how does k8s achieve the association of file and memory? What is the principle of reading and writing memory data as file? mmap?

According to wikipdia:
tmpfs is a temporary file storage paradigm implemented in many Unix-like operating systems. It is intended to appear as a mounted file system, but data is stored in volatile memory instead of a persistent storage device. A similar construction is a RAM disk, which appears as a virtual disk drive and hosts a disk file system.
So its not k8s feature. Is is a Linux feature that just appears to be used by k8s.
You can read more about it in linux kernel documentation

Related

Kubernetes: in-memory shared cache between pods

I am looking for any existing implementation of sharing a read only in-memory cache across pods on the same node. This setup would allow fast access without the need to load the entire cache into each pods memory.
Example: 1GB lookup dictionary is kept up to date, each pod has read access to the data allowing fast lookup without effectively cloning the data into memory. So end result would be just 1GB of memory utilized on the node, and not 1GB * N(number of pods)
Imagined solution for k8s:
A single (Daemon) pod that has the tmpfs volume RW, maintaining an up to date cache data
Multiple pods that have the same tmpfs volume R(only), mapping the data file(s) to read data out
Naturally reading out values and operating on them is expected to create transitive memory usage
Notes:
I have found multiple entries regarding volume sharing between pods, but no complete solution reference for the above
tmpfs is ideal for cache speed of R/W, however obviously it can be over regular fs
Looking for solutions that can be language specific or agnostic for reference
Language specific would be utilizing a specific language to map a file data as a Dictionary / other KV lookup
Language agnostic and more generalized solution could be utilizing sqlite where processes in our case here would be pods
You could use hostIPC and/or hostPath mounted on tmpfs, but that comes with a swathe of issues:
hostPath by itself poses a security risk, and when used, should be scoped to only the required file or directory, and mounted as ReadOnly. It also comes with the caveat of not knowing who will get "charged"
for the memory, so every pod has to be provisioned to be able to
absorb it, depending how it is written. It also might "leak" up to
the root namespace and be charged to nobody but appear as "overhead"
hostIPC is a part of Pod Security Policies
, which are depracted as of 1.21, and will be removed in the future
Generally, the best idea is to use Redis, it is one of the most used tools in such scenarios.

Migrate to kubernetes

We're planning to migrate our software to run in kubernetes with auto scalling, this is our current infrastructure:
PHP and apache are running in Google Compute Engine n1-standard-4 (4 vCPUs, 15 GB memory)
MySql is running in Google Cloud SQL
Data files (csv, pdf) and the code are storing in a single SSD Persistent Disk
I found many posts that recomments to store the data file in the Google Cloud Storage and use the API to fetch the file and uploading to the bucket. We have very limited time so I decide to use NFS to share the data files over the pods, the problem is nfs speed is slow, it's around 100mb/s when I copying the file with pv, the result from iperf is 1.96 Gbits/sec.Do you know how to achieve the same result without implement the cloud storage? or increase the NFS speed?
Data files (csv, pdf) and the code are storing in a single SSD Persistent Disk
There's nothing stopping you from volume mounting an SSD into the Pod so you can continue to use an SSD. I can only speak to AWS terminology, but some EC2 instances come with "local" SSD hardware, and thus you would only need to use a nodeSelector to ensure your Pods were scheduled onto machines that had said local storage available.
Where you're going to run into problems is if you are currently just using one php+apache and thus just one SSD, but now you want to scale the application up and it requires that all php+apache have access to the same SSD. That's a classic distributed application architecture problem, and something kubernetes itself can't fix for you.
If you're willing to expend the effort, you can also try any one of the other distributed filesystems (Ceph, GlusterFS, etc) and see if they perform better for your situation. Then again, "We have very limited time" I guess pretty much means that's off the table.

Should I use SSD or HDD as local disks for kubernetes cluster?

Is it worth using SSD as boot disk? I'm not planning to access local disks within pods.
Also, GCP by default creates 100GB disk. If I use 20GB disk, will it cripple the cluster or it's OK to use smaller sized disks?
Why one or the other?. Kubernetes (Google Conainer Engine) is mainly Memory and CPU intensive unless your applications need a huge throughput on the hard drives. If you want to save money you can create tags on the nodes with HDD and use the node-affinity to tweak which pods goes where so you can have few nodes with SSD and target them with the affinity tags.
I would always recommend SSD considering the small difference in price and large difference in performance. Even if it just speeds up the deployment/upgrade of containers.
Reducing the disk size to what is required for running your PODs should save you more. I cannot give a general recommendation for disk size since it depends on the OS you are using and how many PODs you will end up on each node as well as how big each POD is going to be. To give an example: When I run coreOS based images with staging deployments for nginx, php and some application servers I can reduce the disk size to 10gb with ample free room (both for master and worker nodes). On the extreme side - If I run self-contained golang application containers without storage need, each POD will only require a few MB space.

EBS or Instance Storage for MongoDB in EC2?

Cassandra recommends using instance local storage for EC2 deployments instead of EBS
I am deploying MongoDB in EC2... should I also be using instance local storage instead of EBS PIOPS?
Here is a slide about using Hybrid (Instance store and PIOPS EBS) of MongoDB on EC2.
http://www.slideshare.net/mongodb/world-high-performance-mongo-db-on-ec2-20140620
Related topic:
Instance store is super fast - https://gist.github.com/ktheory/3c3616fca42a3716346b
Conclusions:
Instance-store is over 5x faster than EBS-SSD for uncached reads.
Instance-store and EBS-SSD are equalivent for cached reads.
Instance-store is over 10x faster than EBS-SSD for writes.
Special notes:
Ephemeral storage or instance-store DOES persist across reboots of an instance! It does not persist across a stop/start, nor a termination, nor some instance hardware failures.
The MongoDB manual has a section with EC2 storage considerations including the general recommendation to use EBS-optimized EC2 instances with provisioned IOPS (PIOPS) EBS volumes.
There are several good reasons to use EBS over local storage:
Local storage (or "Instance Store" in EC2 terms) is ephemeral and introduces potential data loss scenarios on instance stop/start/terminate as well as hardware failure (see AWS docs on Instance Store Lifetime).
While an Instance Store is dedicated to a particular instance, the disk subsystem is shared among instances on the host server hardware. As with regular EBS volumes, contention for a shared resource can lead to unpredictable I/O behaviour. Provisioned IOPS EBS volumes will provide more predictable I/O performance for an active database workload -- no spikes of higher than expected performance, but also no troughs of decreased performance.
The sizes of Instance Stores are determined by the instance type. EBS volumes can be provisioned independently to meet your storage and performance requirements.
If you want to change your instance types, EBS volumes can be re-attached to a new instance in the same availability zone.
EBS volumes can be combined using RAID for additional capacity or redundancy.
EBS volumes support asynchronous snapshots, which are a common backup strategy.
EBS volumes can support encryption for data at rest for most instance types.
EBS is recommended because it provided by more than one actual drive with 2ms transaction commit between mirror drives. EBS itself is fast enough and can reach 500+MB/sec for read and write.
Linux kernel this is what affect IOPS dramatically, see what Pinterest engineers investigated:
Final choice: kernel 3.18.7 + XFS + 64K RAID block size.
• Best overall performance for async random read.
• Very competitive performance everywhere else.
• Networking-related kernel bugs (Xen-specific) in 3.13 that aren’t
fixed until 3.16.
https://www.percona.com/live/mysql-conference-2015/sites/default/files/slides/all_your_iops_are_belong_to_usPLMCE2015.pdf

GCE Use Cloud Storage Bucket as Mounted Drive

Is there a way to mount a storage bucket to an instance so it can be used by the webserver as storage? If not, how can I add more storage to the instance without adding another persistent disk with an OS?
Aside from attaching a new persistent disk, you could also use a number of FUSE based utilities to mount either a Google Cloud Storage or AWS S3 bucket as a local disk.
s3fs:
*Can work with Google Cloud or AWS
*Bucket can be mounted on multiple systems at same time
*Files are stored as objects on the bucket, so the files can be manipulated externally
*A con is that it can be a little bit slow if you have a lot of files
S3QL:
*Can work with Google Cloud or AWS
*Bucket can be mounted on one system
*Files are stored in a proprietary format, can't be manipulated outside of the mounted filesystem
*Much faster than s3fs for many files
*Doesn't handle network connectivity issues so well (manual fsck and remount if you lose network).
Hope this helps.
You can certainly create a new (larger) Persistent Disk and attach it to your instance as a data disk. This is a very good option, since it keeps your website data separate from your operating system. See the Persistent Disk docs for details on all the options.
In your case:
Create a new Persistent Disk for the data. Pick a size large enough for your data and large enough to get the I/O throughput you want. (See this chart for details)
Attach the disk to your instance.