Backing up EC2 instance from Ephemeral to Persistent Storage - mongodb

I'm pretty new with EC2 and backing up data, but currently, the app that I've built has no backup strategy and I want to know how to build a proper one. Currently, I have my RoR app and my MongoDB database on one instance. I've just now read about EBS volumes and snapshots, but I just can't wrap my head around it.
Supposedly EBS can be used as a datastore. If that is so, how do I set up a MongoDB database in EBS and migrate the data I have in my EC2 instance to it? I'm not familiar with configuring EBS and I've read the documentation and have more questions than answers.
In short, my instance is ephemeral storage right now and I want to turn it into persistent storage.
Thank you,
Don

It is pretty simple.
EBS is network disk volumes, it is used to store data.
A snapshot is an compress image backup, so this can apply to EC2 instance, RDS instances, even snapshot EBS volumes itself. After create the snapshot, it must store some where, thus, AWS use to store this backup into EBS.
Configure EBS is not difficult, it is little different that put on a new hard drive. You just need to "attach" an EBS volume to your instance. Then inside the EC2, do the usual OS disk initialisation work.
Because EBS is a dynamic storage, as long as your EC2 instance OS support it, you can extend the disk space anytime you need it (although it is recommended to do backup before doing it).
But from the operation perspective, you may want to consider putting your data into RDS if it is run for 24x7x365. So you don't need to deal with DB installation, complicate replication update,etc. If you run the DB occasionally, then you might want to stick to the EC2 instance mongodb.

Related

Microservice Application ... Docker Volume for Databases or no Docker Volume?

I have an application (JHipster Gateway, UAA, Registry, 5 microservices) and each application source builds a Docker image and pushes to GitLab registry. Currently I'm running everything on Rancher using a Docker-Compose file. My volumes for Mongo databases are currently in each container.
I need advice about volume mounts. Here are my options as I see them.
Leave data in containers and monitor and backup
Use external mounts and monitor volumes on host.
If I leave Mongo data in the containers, do I just set up to just cluster and when the internal volumes fill, the database just scales? I am looking for some explanation to help my choice with Mongo database mounts, internal or external (on host)?
Thanks in advance,
David L. Whitehurst
Never store any data you care about directly in containers. There are good arguments in favor of both named volumes (native to Docker, some support in a multi-host Swarm environment, fewer host-specific dependencies) and host bind mounts (much easier to back up and maintain, possible to examine directly if needed) but use some sort of mounted storage.
The most important note here is that it's fairly routine to delete and recreate containers. If the software you're running or its underlying library stack has a security issue, you generally need to get (or build) an updated image, delete your existing container, and rebuild it against the new image. If data is stored only inside a container, then during this very routine delete-and-recreate operation, there's significant risk of losing data.
In principle, if you're really careful, and you have a replicated data store, you can roll this over without external volumes and not lose data. It's tricky, and takes a lot of patience; you'll be forced to take down one replica, wait for its data to be rebalanced across the other replicas, start up a new replica, wait for it to accept some of the data, and so on. If you can take a point release by stopping a container, deleting it, starting a new one with the same data store, and have it come up instantly with populated data, that's much easier to manage.
(The other corollary here is that you don't "back up containers", since they don't have any data you care about. You do back up the data stored on the host or in Docker named volumes, and you can always recreate the container from its image plus the external data.)

Accessing mongodb data on aws instance

Due to some hardware issue my aws instance stopped functioning. Team suggested me to stop and and start the instanace.
Now aws provided new IP, where all data is present. I installed mongodb and had couple of databases there.
Now when I checked on new server mongodb was not working. I started mongod and letter I asked to create /data/db directory. Now mongodb is functioning but when I do
"show databases" none of my previous database appearning. Any help on getting this data back.?
A AWS EC2 instance have two types of Storage. A Ephemeral storage and a EBS Volume storage.
The Ephemeral storage should be used for temporary data only. If you restart your EC2 the data in it will not be lost, but if you stop and restart you loose it all. When trying to stop a EC2 AWS gives you this message.
Note that when your instances are stopped: Any data on the ephemeral
storage of your instances will be lost.
This kind of storage is provisioned very close to the instance and because of that it is faster.
EBS is a persistent storage independent of your EC2 instance. It can be attached/dettached from your EC2. This is the kind of storage you want to use when creating a database inside your instance.

AMI for EC2 instance with a MongoDB?

I am running an Amazon EC2 instance with a MongoDB running on it.
Since I will need to use it only for some time, I was wondering if it is possible to keep only image of the system for the usage time with Amazon Machine Image. Any idea?
You can actually create an AMI from your server and then terminate the server when you don't need it.
When you need it again you can relaunch a new server based on the AMI you created. The downside to this is that your latest data may not be up to date. So I recommend creating the AMI right before you terminate the server.
Another alternative is to just use EBS backed storage/instances and just shutdown the instance when you don't need it. You can just start the instance when you need it. There's little cost associated with keeping an EBS volume around. Certainly much less than keeping your EC2 instance running all the time.
Hope this helps.
A machine stopped it´s a machine that Amazon don´t charge you.
You get charged for:
Online time
Storage space (assumably you store the image on S3 [EBS])
Elastic IP addresses
Bandwidth
But Amazon charge you for your AMI´s created.
So you can stop your machine and just start it when you need to use it.

Horizontal Scalability and disk storage

I am trying to understand how horizontal scalability (virtualization) is working in terms of disk storage.
virtualization is a layer upon the hardware computer nodes and manage the needed resources for the requests.
So my question is what happens when I deploy my war into the web server for example ? I mean I have a replicated storage in different nodes?
After I did some researches I found NAS vs SAN. so i expect to have SAN replication for data stability .... that is true ?
Where is my storage disk exactly when I have a horizontal based server like Google Engine or AWS?
Thanks,
Hopefully a couple of these examples will help. Let's take a general, crude example. I'll try to keep information simple to understand. Let's say I have a business running on LAMP stack. Apache+PHP is running on WEB1 server, MySQL on DB1 server. Customer data sits on WEB1.
SAN replication
First - your question about replication. That's mostly for disaster recovery. For data stability/reliability, SAN have appropriate RAID levels, service level agreements and spare disks. For example RAID5 tolerates failure of 1 disk in a raid-set. RAID6 tolerates failure of 2 disks in a raid-set etc. Having hot-spare disks help in quick repopulation of failed RAID disk. Organizations also snapshot their disk volumes and replay them in a different data-center so as to have a 2nd copy of their data. This is done over and above regular backups and VM snapshots.
AWS disks
There are 2 types of disks AWS has:
Ephemeral: disks connected to EC2 instance
Elastic Block Storage (EBS)
Ephemeral storage
Don't use this for anything critical. AWS offers EC2 instances with ephemeral storage (that means, VM has disks attached to the server) and recommends that users purchase slice of their disk in the form of EBS (Elastic Block Storage). I'd chose to not run anything on ephemeral storage because if EC2 instance stops, information on ephemeral storage is gone! However, if my partitions were on EBS volume, EC2 restart will be seamless. All data will stay alive on my EBS volume.
EBS
When I want a VM, I'd choose an EC2 instance (CPU/Memory). Then I buy disk in the form of EBS volume of 100GB (or more if I want to do RAID/LVM etc.) and attach it to my EC2 instance. Now I can install OS on my EC2 volume. Partitions are all created on my EBS volume. When EC2 reboots, my data stays as-is.
Disk scaling
Let's say I began my business with an EC2 instance + 100GB of EBS volume. All's well until my customers began to upload really large files. My disk is getting full and I need to expand a partition. With AWS, I could buy another slice of 100GB of EBS volume and expand my partition to use this additional 100GB.
Server scaling
Let's say my business is doing really well and my EC2 instance isn't keeping up with traffic. I need more horse-power and I choose to add another server WEB2 running Apache+PHP server with its own EBS volume. But what about customer data? Will I store some data on WEB1 and some on WEB2? That'd be hard to reconcile.
Keeping code same on WEB1 and WEB2
Code from Git (or version control of choice) will be deployed to both WEB1 and WEB2 simultaneously. That will keeps both my server's code up to date. Configuration management of my servers can happen through Ansible/Puppet/Chef.
Streamlining data storage
I have some options. Let's discuss two options that will allow WEB1 and WEB2 to share data/disk space. Important note - EBS volume cannot be shared with multiple EC2 instances. EBS volume can be attached to only one EC2 instance.
First option - stand up another server DATA1 and attach a large EBS volume to it and move customer files there. WEB1 and WEB2 will send customer data to DATA1 (rsync/ftp/scp). WEB1 and WEB2 will read/write from DB1 database also. I could even safeguard my data by taking snapshots of EBS volume and replaying the snapshot on another server called DATA2 in a different AWS region or availability zone in case DATA1 is unavailable.
Second option - AWS has S3 storage. It's reliable and cheaper than EBS. Instead of standing up DATA1 and DATA2, it is much easier and cheaper to create a bucket on S3 and store customer data there. WEB1 and WEB2 can read/write to S3 seamlessly.
Where're my disks on AWS?
I don't know, and I don't need to know. AWS must have racks and racks of disks. I am getting a slice of disk space from somewhere there. Their disks are likely to have redundancy but EBS failures are possible. For our own sanity, it is good to RAID and snapshot EBS volumes over and above taking regular backups.
Similar to disks, AWS must have racks and racks of servers. I am getting a virtual machine in the form of EC2 instance of my choice from somewhere in those racks. When I shutdown and restart EC2 server, I may get the same specification VM from a different rack. However, when my EBS volume will remain the same unless I terminate EBS volume and buy a new EBS volume.
One thing to recognize is that if I bought EC2 instance in Oregon, my EBS volume will be in the same Oregon region and also the same availability zone.
Note - this is a very generic answer.

GCE Use Cloud Storage Bucket as Mounted Drive

Is there a way to mount a storage bucket to an instance so it can be used by the webserver as storage? If not, how can I add more storage to the instance without adding another persistent disk with an OS?
Aside from attaching a new persistent disk, you could also use a number of FUSE based utilities to mount either a Google Cloud Storage or AWS S3 bucket as a local disk.
s3fs:
*Can work with Google Cloud or AWS
*Bucket can be mounted on multiple systems at same time
*Files are stored as objects on the bucket, so the files can be manipulated externally
*A con is that it can be a little bit slow if you have a lot of files
S3QL:
*Can work with Google Cloud or AWS
*Bucket can be mounted on one system
*Files are stored in a proprietary format, can't be manipulated outside of the mounted filesystem
*Much faster than s3fs for many files
*Doesn't handle network connectivity issues so well (manual fsck and remount if you lose network).
Hope this helps.
You can certainly create a new (larger) Persistent Disk and attach it to your instance as a data disk. This is a very good option, since it keeps your website data separate from your operating system. See the Persistent Disk docs for details on all the options.
In your case:
Create a new Persistent Disk for the data. Pick a size large enough for your data and large enough to get the I/O throughput you want. (See this chart for details)
Attach the disk to your instance.