Is there a way to mount a storage bucket to an instance so it can be used by the webserver as storage? If not, how can I add more storage to the instance without adding another persistent disk with an OS?
Aside from attaching a new persistent disk, you could also use a number of FUSE based utilities to mount either a Google Cloud Storage or AWS S3 bucket as a local disk.
s3fs:
*Can work with Google Cloud or AWS
*Bucket can be mounted on multiple systems at same time
*Files are stored as objects on the bucket, so the files can be manipulated externally
*A con is that it can be a little bit slow if you have a lot of files
S3QL:
*Can work with Google Cloud or AWS
*Bucket can be mounted on one system
*Files are stored in a proprietary format, can't be manipulated outside of the mounted filesystem
*Much faster than s3fs for many files
*Doesn't handle network connectivity issues so well (manual fsck and remount if you lose network).
Hope this helps.
You can certainly create a new (larger) Persistent Disk and attach it to your instance as a data disk. This is a very good option, since it keeps your website data separate from your operating system. See the Persistent Disk docs for details on all the options.
In your case:
Create a new Persistent Disk for the data. Pick a size large enough for your data and large enough to get the I/O throughput you want. (See this chart for details)
Attach the disk to your instance.
Related
In k8s, we can use the memory medium(tmpfs instance) to define the emptyDir volume and mount it to pod's container. In the container, we can read and write data according to the file interface.
I want to know how does k8s achieve the association of file and memory? What is the principle of reading and writing memory data as file? mmap?
According to wikipdia:
tmpfs is a temporary file storage paradigm implemented in many Unix-like operating systems. It is intended to appear as a mounted file system, but data is stored in volatile memory instead of a persistent storage device. A similar construction is a RAM disk, which appears as a virtual disk drive and hosts a disk file system.
So its not k8s feature. Is is a Linux feature that just appears to be used by k8s.
You can read more about it in linux kernel documentation
When mounting GCS through FUSE, gcsfuse does the file/files stored in the mount point are saved on the local disk file system (meaning does it consume actual disk space) or all data is stored directly to the cloud?
gcsfuse downloads files to a temporary location, and keeps a cache. This is usually the right thing because otherwise you can use up all your available ram. If you want, you can prevent storing a local copy on disk by setting --temp-dir to a ramdisk.
We're planning to migrate our software to run in kubernetes with auto scalling, this is our current infrastructure:
PHP and apache are running in Google Compute Engine n1-standard-4 (4 vCPUs, 15 GB memory)
MySql is running in Google Cloud SQL
Data files (csv, pdf) and the code are storing in a single SSD Persistent Disk
I found many posts that recomments to store the data file in the Google Cloud Storage and use the API to fetch the file and uploading to the bucket. We have very limited time so I decide to use NFS to share the data files over the pods, the problem is nfs speed is slow, it's around 100mb/s when I copying the file with pv, the result from iperf is 1.96 Gbits/sec.Do you know how to achieve the same result without implement the cloud storage? or increase the NFS speed?
Data files (csv, pdf) and the code are storing in a single SSD Persistent Disk
There's nothing stopping you from volume mounting an SSD into the Pod so you can continue to use an SSD. I can only speak to AWS terminology, but some EC2 instances come with "local" SSD hardware, and thus you would only need to use a nodeSelector to ensure your Pods were scheduled onto machines that had said local storage available.
Where you're going to run into problems is if you are currently just using one php+apache and thus just one SSD, but now you want to scale the application up and it requires that all php+apache have access to the same SSD. That's a classic distributed application architecture problem, and something kubernetes itself can't fix for you.
If you're willing to expend the effort, you can also try any one of the other distributed filesystems (Ceph, GlusterFS, etc) and see if they perform better for your situation. Then again, "We have very limited time" I guess pretty much means that's off the table.
I would like to know if there is a way to mount google cloud storage bucket as a folder for the first time
and each time we read the file, cache it locally (so it won't use money/bandwidth).
GCSFUSE has two type of caching available, Stat caching and type caching. You can refer to this document which provide detailed information on these types of caching with there trade-offs.
I'm pretty new with EC2 and backing up data, but currently, the app that I've built has no backup strategy and I want to know how to build a proper one. Currently, I have my RoR app and my MongoDB database on one instance. I've just now read about EBS volumes and snapshots, but I just can't wrap my head around it.
Supposedly EBS can be used as a datastore. If that is so, how do I set up a MongoDB database in EBS and migrate the data I have in my EC2 instance to it? I'm not familiar with configuring EBS and I've read the documentation and have more questions than answers.
In short, my instance is ephemeral storage right now and I want to turn it into persistent storage.
Thank you,
Don
It is pretty simple.
EBS is network disk volumes, it is used to store data.
A snapshot is an compress image backup, so this can apply to EC2 instance, RDS instances, even snapshot EBS volumes itself. After create the snapshot, it must store some where, thus, AWS use to store this backup into EBS.
Configure EBS is not difficult, it is little different that put on a new hard drive. You just need to "attach" an EBS volume to your instance. Then inside the EC2, do the usual OS disk initialisation work.
Because EBS is a dynamic storage, as long as your EC2 instance OS support it, you can extend the disk space anytime you need it (although it is recommended to do backup before doing it).
But from the operation perspective, you may want to consider putting your data into RDS if it is run for 24x7x365. So you don't need to deal with DB installation, complicate replication update,etc. If you run the DB occasionally, then you might want to stick to the EC2 instance mongodb.