How do we create our own scalable storage buckets with Kubernetes? - kubernetes

Instead of using Google Cloud or AWS Storage buckets; how do we create our own scalable storage bucket?
For example; if someone was to hit a photo 1 billion times a day. What would be the options here? Saying that the photo is user generated and not image/app generated.
If I have asked this in the wrong place, please redirect me.

As an alternative to GKE or AWS objects storage, you could consider using something like MinIO.
It's easy to set up, it could run in Kubernetes. All you need is some PersistentVolumeClaim, to write your data. Although you could use emptyDirs to evaluate the solution, with ephemeral storage.
A less obvious alternative would be something like Ceph. It's more complicated to setup, although it goes beyond objects storage. If you need to implement block storage as well, for your Kubernetes cluster, then Ceph could do this (Rados Block Devices), whilst offering with object storage (Rados Gateways).

Related

AWS EBS how to get file change notifications

I am using kubernetes and using AWS EBS as a persistent volume.
Is there any way of detecting changes on the EBS volume such as a file update/add/delete?
It could be anything really, lambda to SNS or SQL, file system hooks etc?
No, this is not possible. It's kinda by design. EBS is treated as a block unlike S3 where objects are treated separately and offer object-level notifications.

PostgresQL data_directory on Google Cloud Storage, possible?

I am new to google cloud and was wondering if it is possible to run PostgresQL container on Cloud Run but the data_directory of PostgresQL was pointed to Cloud Storage?
If possible, then please could you point me to some tutorials/guides on this topic. And also what are the downsides of this approach?
Edit-0: Just to clarify what I am trying to achieve:
I am learning google cloud and want to write simple application to work along with it. I have decided that the backend code will run as a container under Cloud Run and the persistent data(i.e the database file) will reside on Cloud Storage. Because this is a small app for learning purpose, I am trying to use as less moving parts as possible on the backend(and also ones that are always free). And also both PostgresQL and the backend code will reside in the same container except for the actual data file, which will reside under Cloud Storage. Is this approach correct? Are there better approaches to achieve the same minimalism?
Edit-1: Okay, I got the answer! The Google documentation here mentions the following:
"Don't run a database over Cloud Storage FUSE!"
The buckets are not meant to store database information, some of the limits are the following:
There is no limit to writes across multiple objects, which includes uploading, updating, and deleting objects. Buckets initially support roughly 1000 writes per second and then scale as needed.
There is no limit to reads of objects in a bucket, which includes reading object data, reading object metadata, and listing objects. Buckets initially support roughly 5000 object reads per second and then scale as needed.
One alternative to separate persistent disk for your PostgreSQL database, is to use Google Compute Engine. You can follow the “How to Set Up a New Persistent Disk for PostgreSQL Data” Community Tutorial.

Azure Service Fabric deployment

I am doing API deployment to Service Fabric Nodes, and it is by default going to D drive (Temp drive), I would like to change this default behavior and deploy it to another drive or C drive to avoid application loss in case of VMSS deallocation. How can I do this?
You say you want to do this to avoid application loss, however:
SF already replicates your application package to multiple machines when you Register the application package in the Image Store (part of the provisioning/deployment process)
Generally, if you want your application code and config to be safe, keeping it somewhere outside the cluster (wherever you're deploying from, or in blob storage) is usually a better answer.
SF doesn't really support deallocating the VMs out from under it and then bringing them back later. See the FAQ answer here.
So overall I'm not sure that what you're trying to do is the right solution to your real problem, and it looks like you're heading into multiple unsupported scenarios, which usually means there's some misunderstanding.
That all said, of course, it's configurable.
Within a node type, you can specify the dataPath (example here). However, it's not recommended that you change this.
"settings": {
"dataPath": "D:\\\\SvcFab",
},

Upgrading amazon EC2 m1.large instance to m3.large with mongodb installed

If I were to upgrade an amazon instance, I'd create a snapshot of the image and create the new instance from this image and then upgrade that instance.
My question(s) is related to mongodb and the best way upgrade from a m1.large to a m3.large instance - basically m3's are cheaper and more powerful than the old m1's.
I currently have mongodb running on the m1.large instance backed by 3 EBS Volumes for storage, journalling and logs (essentially the mongodb image config from the MarketPlace).
When i've gone through to setup the new m3.large instance, I noticed that it's not EBS Optimized.
Working with mongodb and the current config, I assume for optimal performance, it's desirable to go the EBS Optimized route - if that's the case, the best upgrade path is to go for m3.xlarge? Would I hit a big performance penalty if I went with a m3.large?
And lastly....after taking a snapshot of an image (specifically an image backed with EBS Volumes), does the new image take that same config setup? I.E The new image will be backed by the same volumes?
I know I can stop and start the current instance, but I want to minimise any downtime.
Any help appreciated!
Firstly, you don't need to create an entire new instance, snap the EBS volumes of the old one, and attach the copies. If you're doing this to try to avoid service interruption, what happens when you switch the EIP from the old to the new instance? Yep - service interruption.
Just stop the m1, reset it to m3, and start. There will be an outage, of course, but you'll be back in less than 5 minutes and you've saved yourself a chunk of work replicating volumes.
As for EBS Optimised - do you really need that? Do you understand what it means, and what the consequences of NOT having it on the new instance are? If the answers to both are YES, then of course pick an m3 (or larger) instance type that supports it. If NO, research until you know what the feature gives you and whether you actually need it (you pay more with it active - don't spend more than you actually need to).

Store files on disk or MongoDB

I am creating a mongodb/nodejs blogging system (similar to wordpress).
I currently have the images being saved on the disk and a pointer being placed in mongo. I was wondering since I have all sessions being stored in MongoDB to enable easy load balancing across servers, would storing the actual files in Mongo also be a smart idea for easy multiserver setups and/or performance gains.
If everything is stored in a DB, you can simply spawn more web servers and/or mongo replications to scale horizontally
Opinions?
MongoDB is a good option to store your files (I'm talking about GridFS), specially for the use case you described above.
When you store files into MongoDB (GridFS, not documents), you get all the replication and sharding capability for free, which is awesome.
If you have to spawn a new server and you have the files already into MongoDB, all you have to do is to enable replication (thus scale horizontally). I'm sure this can save you a lot of headaches.
Resources:
Is GridFS fast and reliable enough for production?
http://www.mongodb.org/display/DOCS/GridFS
http://www.coffeepowered.net/2010/02/17/serving-files-out-of-gridfs/
Aside from GridFS, you might be considering a cloud-based deployment. In that case, you might consider storing files in cloud-specific storage (Windows Azure has Blob Storage, for example). Sticking with Windows Azure for this example (since that's what I work with), you'd reference a file by its storage account URI. For example:
https://mystorageacct.blob.core.windows.net/mycontainer/myvideo.wmv
Since you'd be storing the MongoDB database itself in its own blob (and mounted as disk volume on your Linux or Windows VM), you could then choose to store your files in either the same storage account or a completely different storage account (with each storage account providing 100TB 200TB of storage).
Storing the image as document in mongodb would be a bad idea, as the resources which could have been used to send a large amount of informational data would be used for sending files.
Have a look at mongoDb file storage GridFS , that might solve your problem
of storing images, and providing horizontal scalability as well.
http://www.mongodb.org/display/DOCS/GridFS
http://www.mongodb.org/display/DOCS/GridFS+Specification