Upgrading amazon EC2 m1.large instance to m3.large with mongodb installed - mongodb

If I were to upgrade an amazon instance, I'd create a snapshot of the image and create the new instance from this image and then upgrade that instance.
My question(s) is related to mongodb and the best way upgrade from a m1.large to a m3.large instance - basically m3's are cheaper and more powerful than the old m1's.
I currently have mongodb running on the m1.large instance backed by 3 EBS Volumes for storage, journalling and logs (essentially the mongodb image config from the MarketPlace).
When i've gone through to setup the new m3.large instance, I noticed that it's not EBS Optimized.
Working with mongodb and the current config, I assume for optimal performance, it's desirable to go the EBS Optimized route - if that's the case, the best upgrade path is to go for m3.xlarge? Would I hit a big performance penalty if I went with a m3.large?
And lastly....after taking a snapshot of an image (specifically an image backed with EBS Volumes), does the new image take that same config setup? I.E The new image will be backed by the same volumes?
I know I can stop and start the current instance, but I want to minimise any downtime.
Any help appreciated!

Firstly, you don't need to create an entire new instance, snap the EBS volumes of the old one, and attach the copies. If you're doing this to try to avoid service interruption, what happens when you switch the EIP from the old to the new instance? Yep - service interruption.
Just stop the m1, reset it to m3, and start. There will be an outage, of course, but you'll be back in less than 5 minutes and you've saved yourself a chunk of work replicating volumes.
As for EBS Optimised - do you really need that? Do you understand what it means, and what the consequences of NOT having it on the new instance are? If the answers to both are YES, then of course pick an m3 (or larger) instance type that supports it. If NO, research until you know what the feature gives you and whether you actually need it (you pay more with it active - don't spend more than you actually need to).

Related

How do we create our own scalable storage buckets with Kubernetes?

Instead of using Google Cloud or AWS Storage buckets; how do we create our own scalable storage bucket?
For example; if someone was to hit a photo 1 billion times a day. What would be the options here? Saying that the photo is user generated and not image/app generated.
If I have asked this in the wrong place, please redirect me.
As an alternative to GKE or AWS objects storage, you could consider using something like MinIO.
It's easy to set up, it could run in Kubernetes. All you need is some PersistentVolumeClaim, to write your data. Although you could use emptyDirs to evaluate the solution, with ephemeral storage.
A less obvious alternative would be something like Ceph. It's more complicated to setup, although it goes beyond objects storage. If you need to implement block storage as well, for your Kubernetes cluster, then Ceph could do this (Rados Block Devices), whilst offering with object storage (Rados Gateways).

PostgresQL data_directory on Google Cloud Storage, possible?

I am new to google cloud and was wondering if it is possible to run PostgresQL container on Cloud Run but the data_directory of PostgresQL was pointed to Cloud Storage?
If possible, then please could you point me to some tutorials/guides on this topic. And also what are the downsides of this approach?
Edit-0: Just to clarify what I am trying to achieve:
I am learning google cloud and want to write simple application to work along with it. I have decided that the backend code will run as a container under Cloud Run and the persistent data(i.e the database file) will reside on Cloud Storage. Because this is a small app for learning purpose, I am trying to use as less moving parts as possible on the backend(and also ones that are always free). And also both PostgresQL and the backend code will reside in the same container except for the actual data file, which will reside under Cloud Storage. Is this approach correct? Are there better approaches to achieve the same minimalism?
Edit-1: Okay, I got the answer! The Google documentation here mentions the following:
"Don't run a database over Cloud Storage FUSE!"
The buckets are not meant to store database information, some of the limits are the following:
There is no limit to writes across multiple objects, which includes uploading, updating, and deleting objects. Buckets initially support roughly 1000 writes per second and then scale as needed.
There is no limit to reads of objects in a bucket, which includes reading object data, reading object metadata, and listing objects. Buckets initially support roughly 5000 object reads per second and then scale as needed.
One alternative to separate persistent disk for your PostgreSQL database, is to use Google Compute Engine. You can follow the “How to Set Up a New Persistent Disk for PostgreSQL Data” Community Tutorial.

Starting and Stopping PostgreSQL Amazon RDS Instance Automatically Based on Usage

We're a team of 4 data scientists that use Amazon RDS PostgreSQL for analysis purposes. So we're looking for a way to automatically start/stop the instance automatically but based on usage as opposed to time.
For example, there are clearly solutions for starting and stopping automatically during regular business hours (Stopping an Amazon RDS DB Instance Temporarily).
However, this doesn't quite work for us because we all have different schedules and don't necessarily adhere to a standard schedule. I would like a script that basically checks whether the DB has been used in the past, say 30 minutes, and if not turn off the instance. Then, if someone tries to connect to the DB but it's turned off, then automatically turn it on. My intuition tells me that the latter is harder than the former, but I'm not sure. Is this possible?
To do this you would need to use a CloudWatch Alarm, to do this you would rely on metrics that are available to CloudWatch such as number of connections or CPU Utilization.
This alarm could trigger a Lambda function that will stop your RDS instance, be aware that an RDS instance will restart once it has been off for 7 days.
Alternatively if you're able to use it you could look into Aurora Serverless with the PostgreSQL compatible version. this option would automatically handle the stop/start functionality when no one is using it.

Automatically change instance tier, good practice?

It is a good practice to automatically change the instance tier of a CloudSQL database? For example, using a cheaper tier during the nights when there is less demand.
Changing tier is a non-trivial operation.
If you are using first generation, it will cause a restart, this causes downtime and the instance will start with a cold buffer pool.
If you are using second generation, the entire GCE instance backing your cloud sql instance will be rebuilt, it will take even longer than the first generation.
Overall, I don't recommend doing this just to save cost.

Store files on disk or MongoDB

I am creating a mongodb/nodejs blogging system (similar to wordpress).
I currently have the images being saved on the disk and a pointer being placed in mongo. I was wondering since I have all sessions being stored in MongoDB to enable easy load balancing across servers, would storing the actual files in Mongo also be a smart idea for easy multiserver setups and/or performance gains.
If everything is stored in a DB, you can simply spawn more web servers and/or mongo replications to scale horizontally
Opinions?
MongoDB is a good option to store your files (I'm talking about GridFS), specially for the use case you described above.
When you store files into MongoDB (GridFS, not documents), you get all the replication and sharding capability for free, which is awesome.
If you have to spawn a new server and you have the files already into MongoDB, all you have to do is to enable replication (thus scale horizontally). I'm sure this can save you a lot of headaches.
Resources:
Is GridFS fast and reliable enough for production?
http://www.mongodb.org/display/DOCS/GridFS
http://www.coffeepowered.net/2010/02/17/serving-files-out-of-gridfs/
Aside from GridFS, you might be considering a cloud-based deployment. In that case, you might consider storing files in cloud-specific storage (Windows Azure has Blob Storage, for example). Sticking with Windows Azure for this example (since that's what I work with), you'd reference a file by its storage account URI. For example:
https://mystorageacct.blob.core.windows.net/mycontainer/myvideo.wmv
Since you'd be storing the MongoDB database itself in its own blob (and mounted as disk volume on your Linux or Windows VM), you could then choose to store your files in either the same storage account or a completely different storage account (with each storage account providing 100TB 200TB of storage).
Storing the image as document in mongodb would be a bad idea, as the resources which could have been used to send a large amount of informational data would be used for sending files.
Have a look at mongoDb file storage GridFS , that might solve your problem
of storing images, and providing horizontal scalability as well.
http://www.mongodb.org/display/DOCS/GridFS
http://www.mongodb.org/display/DOCS/GridFS+Specification