We're planning to migrate our software to run in kubernetes with auto scalling, this is our current infrastructure:
PHP and apache are running in Google Compute Engine n1-standard-4 (4 vCPUs, 15 GB memory)
MySql is running in Google Cloud SQL
Data files (csv, pdf) and the code are storing in a single SSD Persistent Disk
I found many posts that recomments to store the data file in the Google Cloud Storage and use the API to fetch the file and uploading to the bucket. We have very limited time so I decide to use NFS to share the data files over the pods, the problem is nfs speed is slow, it's around 100mb/s when I copying the file with pv, the result from iperf is 1.96 Gbits/sec.Do you know how to achieve the same result without implement the cloud storage? or increase the NFS speed?
Data files (csv, pdf) and the code are storing in a single SSD Persistent Disk
There's nothing stopping you from volume mounting an SSD into the Pod so you can continue to use an SSD. I can only speak to AWS terminology, but some EC2 instances come with "local" SSD hardware, and thus you would only need to use a nodeSelector to ensure your Pods were scheduled onto machines that had said local storage available.
Where you're going to run into problems is if you are currently just using one php+apache and thus just one SSD, but now you want to scale the application up and it requires that all php+apache have access to the same SSD. That's a classic distributed application architecture problem, and something kubernetes itself can't fix for you.
If you're willing to expend the effort, you can also try any one of the other distributed filesystems (Ceph, GlusterFS, etc) and see if they perform better for your situation. Then again, "We have very limited time" I guess pretty much means that's off the table.
Related
I recently upgraded Fedora Server (F) v36 to v37. The F36 server had a volume group consisting of three physical drives combined to form a storage pool, which I named “BigDrive”. During the upgrade the logical volume information seems to have been lost and BigDrive didn’t appear or mount in the F37 server. I’ve been unable to find any backup logical volume information. At present the 3 drives are installed on the F37 server. I would welcome advise on how to recombine the three drives, recover the logical volume information, and access the data stored in the shared pool. Can anyone suggest a process to do that, or a utility that could rebuild the storage pool from the physical drives?
I haven't found any helpful information in the various Fedora documentation or usual websites that don't reference using the backed up logical volume information which somehow didn't survive the upgrade process. This is because the OS hard drive was wiped and repartitioned as part of the upgrade. The drives that formed the storage pool were not formatted, nor do they store any OS or application files. They were purely data storage.
I'm preparing my production environment on the Hetzner cloud, but I have some doubts (I'm more a developer than a devops).
I will get 3 servers for the replicaset with 8 core, 32 Gb ram and 240 gb ssd. I'm a bit worried about the size of the ssd the server comes with and Hetzner has the possibility to create volumes to be attached to the servers. Since mongodb uses a single folder for the db data, I was wondering how can I use the 240 gb that comes with the server in combination with external volumes. At the beginning I can use the 240 gb, but then I will have to move the data folder to a volume when it reaches capacity. Im fine with this, but it looks to me that when I will move to volumes, this 240gb will not be used anymore (yes I can use them to save the mongo journaling as they suggest to store it in a separate partition).
So, my noob question is, how can I use both the disk that comes with the server and the external volumes?
Thank you
Is it worth using SSD as boot disk? I'm not planning to access local disks within pods.
Also, GCP by default creates 100GB disk. If I use 20GB disk, will it cripple the cluster or it's OK to use smaller sized disks?
Why one or the other?. Kubernetes (Google Conainer Engine) is mainly Memory and CPU intensive unless your applications need a huge throughput on the hard drives. If you want to save money you can create tags on the nodes with HDD and use the node-affinity to tweak which pods goes where so you can have few nodes with SSD and target them with the affinity tags.
I would always recommend SSD considering the small difference in price and large difference in performance. Even if it just speeds up the deployment/upgrade of containers.
Reducing the disk size to what is required for running your PODs should save you more. I cannot give a general recommendation for disk size since it depends on the OS you are using and how many PODs you will end up on each node as well as how big each POD is going to be. To give an example: When I run coreOS based images with staging deployments for nginx, php and some application servers I can reduce the disk size to 10gb with ample free room (both for master and worker nodes). On the extreme side - If I run self-contained golang application containers without storage need, each POD will only require a few MB space.
I am trying to understand how horizontal scalability (virtualization) is working in terms of disk storage.
virtualization is a layer upon the hardware computer nodes and manage the needed resources for the requests.
So my question is what happens when I deploy my war into the web server for example ? I mean I have a replicated storage in different nodes?
After I did some researches I found NAS vs SAN. so i expect to have SAN replication for data stability .... that is true ?
Where is my storage disk exactly when I have a horizontal based server like Google Engine or AWS?
Thanks,
Hopefully a couple of these examples will help. Let's take a general, crude example. I'll try to keep information simple to understand. Let's say I have a business running on LAMP stack. Apache+PHP is running on WEB1 server, MySQL on DB1 server. Customer data sits on WEB1.
SAN replication
First - your question about replication. That's mostly for disaster recovery. For data stability/reliability, SAN have appropriate RAID levels, service level agreements and spare disks. For example RAID5 tolerates failure of 1 disk in a raid-set. RAID6 tolerates failure of 2 disks in a raid-set etc. Having hot-spare disks help in quick repopulation of failed RAID disk. Organizations also snapshot their disk volumes and replay them in a different data-center so as to have a 2nd copy of their data. This is done over and above regular backups and VM snapshots.
AWS disks
There are 2 types of disks AWS has:
Ephemeral: disks connected to EC2 instance
Elastic Block Storage (EBS)
Ephemeral storage
Don't use this for anything critical. AWS offers EC2 instances with ephemeral storage (that means, VM has disks attached to the server) and recommends that users purchase slice of their disk in the form of EBS (Elastic Block Storage). I'd chose to not run anything on ephemeral storage because if EC2 instance stops, information on ephemeral storage is gone! However, if my partitions were on EBS volume, EC2 restart will be seamless. All data will stay alive on my EBS volume.
EBS
When I want a VM, I'd choose an EC2 instance (CPU/Memory). Then I buy disk in the form of EBS volume of 100GB (or more if I want to do RAID/LVM etc.) and attach it to my EC2 instance. Now I can install OS on my EC2 volume. Partitions are all created on my EBS volume. When EC2 reboots, my data stays as-is.
Disk scaling
Let's say I began my business with an EC2 instance + 100GB of EBS volume. All's well until my customers began to upload really large files. My disk is getting full and I need to expand a partition. With AWS, I could buy another slice of 100GB of EBS volume and expand my partition to use this additional 100GB.
Server scaling
Let's say my business is doing really well and my EC2 instance isn't keeping up with traffic. I need more horse-power and I choose to add another server WEB2 running Apache+PHP server with its own EBS volume. But what about customer data? Will I store some data on WEB1 and some on WEB2? That'd be hard to reconcile.
Keeping code same on WEB1 and WEB2
Code from Git (or version control of choice) will be deployed to both WEB1 and WEB2 simultaneously. That will keeps both my server's code up to date. Configuration management of my servers can happen through Ansible/Puppet/Chef.
Streamlining data storage
I have some options. Let's discuss two options that will allow WEB1 and WEB2 to share data/disk space. Important note - EBS volume cannot be shared with multiple EC2 instances. EBS volume can be attached to only one EC2 instance.
First option - stand up another server DATA1 and attach a large EBS volume to it and move customer files there. WEB1 and WEB2 will send customer data to DATA1 (rsync/ftp/scp). WEB1 and WEB2 will read/write from DB1 database also. I could even safeguard my data by taking snapshots of EBS volume and replaying the snapshot on another server called DATA2 in a different AWS region or availability zone in case DATA1 is unavailable.
Second option - AWS has S3 storage. It's reliable and cheaper than EBS. Instead of standing up DATA1 and DATA2, it is much easier and cheaper to create a bucket on S3 and store customer data there. WEB1 and WEB2 can read/write to S3 seamlessly.
Where're my disks on AWS?
I don't know, and I don't need to know. AWS must have racks and racks of disks. I am getting a slice of disk space from somewhere there. Their disks are likely to have redundancy but EBS failures are possible. For our own sanity, it is good to RAID and snapshot EBS volumes over and above taking regular backups.
Similar to disks, AWS must have racks and racks of servers. I am getting a virtual machine in the form of EC2 instance of my choice from somewhere in those racks. When I shutdown and restart EC2 server, I may get the same specification VM from a different rack. However, when my EBS volume will remain the same unless I terminate EBS volume and buy a new EBS volume.
One thing to recognize is that if I bought EC2 instance in Oregon, my EBS volume will be in the same Oregon region and also the same availability zone.
Note - this is a very generic answer.
Is there a way to mount a storage bucket to an instance so it can be used by the webserver as storage? If not, how can I add more storage to the instance without adding another persistent disk with an OS?
Aside from attaching a new persistent disk, you could also use a number of FUSE based utilities to mount either a Google Cloud Storage or AWS S3 bucket as a local disk.
s3fs:
*Can work with Google Cloud or AWS
*Bucket can be mounted on multiple systems at same time
*Files are stored as objects on the bucket, so the files can be manipulated externally
*A con is that it can be a little bit slow if you have a lot of files
S3QL:
*Can work with Google Cloud or AWS
*Bucket can be mounted on one system
*Files are stored in a proprietary format, can't be manipulated outside of the mounted filesystem
*Much faster than s3fs for many files
*Doesn't handle network connectivity issues so well (manual fsck and remount if you lose network).
Hope this helps.
You can certainly create a new (larger) Persistent Disk and attach it to your instance as a data disk. This is a very good option, since it keeps your website data separate from your operating system. See the Persistent Disk docs for details on all the options.
In your case:
Create a new Persistent Disk for the data. Pick a size large enough for your data and large enough to get the I/O throughput you want. (See this chart for details)
Attach the disk to your instance.