Where do I get exact used storage for DocuementDB cluster - mongodb

I have created one DocumentDB cluster in AWS with two instance running in it, but I need to know the exact storage which will be used for storing the data and also how AWS charge for one cluster.

When you provision an Amazon DocumentDB cluster, you don’t need to specify how much storage or I/Os you need for your cluster. Amazon DocumentDB uses a unique storage system that automatically scales from 10 GB up to 64 TB of data per cluster in 10 GB increments.
Storage is at the cluster level, which means all your instances share the storage. You can view how much storage are you using my monitoring the VolumeBytesUsed metric in the Monitoring tab of your Amazon DocumentDB console. Storage in DocumentDB is priced as low as $0.02 per GB/month (prices may vary across AWS regions). Details here - https://aws.amazon.com/documentdb/pricing/. To how much you are paying for storage, you can also go to the AWS Billing console and view the details of your DocumentDB bill

Related

Migrate to kubernetes

We're planning to migrate our software to run in kubernetes with auto scalling, this is our current infrastructure:
PHP and apache are running in Google Compute Engine n1-standard-4 (4 vCPUs, 15 GB memory)
MySql is running in Google Cloud SQL
Data files (csv, pdf) and the code are storing in a single SSD Persistent Disk
I found many posts that recomments to store the data file in the Google Cloud Storage and use the API to fetch the file and uploading to the bucket. We have very limited time so I decide to use NFS to share the data files over the pods, the problem is nfs speed is slow, it's around 100mb/s when I copying the file with pv, the result from iperf is 1.96 Gbits/sec.Do you know how to achieve the same result without implement the cloud storage? or increase the NFS speed?
Data files (csv, pdf) and the code are storing in a single SSD Persistent Disk
There's nothing stopping you from volume mounting an SSD into the Pod so you can continue to use an SSD. I can only speak to AWS terminology, but some EC2 instances come with "local" SSD hardware, and thus you would only need to use a nodeSelector to ensure your Pods were scheduled onto machines that had said local storage available.
Where you're going to run into problems is if you are currently just using one php+apache and thus just one SSD, but now you want to scale the application up and it requires that all php+apache have access to the same SSD. That's a classic distributed application architecture problem, and something kubernetes itself can't fix for you.
If you're willing to expend the effort, you can also try any one of the other distributed filesystems (Ceph, GlusterFS, etc) and see if they perform better for your situation. Then again, "We have very limited time" I guess pretty much means that's off the table.

Verify from a local system if an Amazon EC2 instance is in use

We have a multiple Amazon EC2 instances(windows). How to find out the state of an instance i.e. if the instance is busy or is in idle state?
Guess it might be possible via Powershell but i am not very sure on this as i dont know much of powershell.
Amazon EC2 instances send metrics to Amazon CloudWatch, including:
CPU Utilization
Disk access
Network traffic
See documentation: Amazon Elastic Compute Cloud Dimensions and Metrics
This information can be retrieved from CloudWatch by using the get-metric-statistics API call or via the AWS Command-Line Interface (CLI). An alarm can also be configured in CloudWatch to trigger a notification when a threshold is passed (eg below 10% CPU utlization for 15 minutes).
See documentation: Creating Amazon CloudWatch Alarms
If you wish to capture additional metrics, such as free disk space or free memory, you will can send log files to CloudWatch Logs.
See documentation: Sending Performance Counters to CloudWatch and Logs to CloudWatch Logs Using Amazon EC2 Simple Systems Manager

Google Cloud SQL CPU Monitoring

I'm working on trying to setup some monitoring on a Google Cloud SQL node and am not seeing how to do it. I was able to install the monitoring agent on my Google Compute Engine instances to monitor CPU, Network, etc. I have not been able to figure out how to do so on the Cloud SQL instance. I have access to these types of monitoring:
Storage Usage (GB)
Number of Read/Write operations
Egress Bytes
Active Connections
MySQL Queries
MySQL Questions
InnoDB Pages Read/Written (pages/sec)
InnoDB Data fsyncs (operations/sec)
InnoDB Log fsyncs (operations/sec)
I'm sure these are great options, but at this point all I want to pay attention to is if my node is performing on a CPU/RAM standpoint as they seem to first and foremost measures for performance.
If I'm missing something, or misunderstnading what I'm trying to do, any advice is appreciated.
Thanks!
Google has a Stackdriver which is for logging and monitoring Google and AWS cloud infrastructure. It can monitor every single thing present on GCP. You can create visualization to monitor your Cloud SQL instance in one dashboard. You just have to ---->
1. login to stackdriver and Go to any existing dashboard, If you dont have create one.---->
2. Add chart and select Cloud SQL in resource Name.---->
3. Select CPU Utilization from metric and save. You can also monitor memory, Disk I/o, Delta count of Queries or servers Up-time and many more.
if you want to monitor any other GCP Compute engine, App-Engine, Kubernetese Engine, storage bucket, Bigtable or pub/sub you just have to select appropriate resource name from list. Hope you got your answer.
You can view all of them directly from the "Overview" tab of the Cloud SQL console:
I have added this as a feature request as issue 110.
https://code.google.com/p/googlecloudsql/issues/detail?id=110

How can I store mongo db collections to Google cloud storage directly?

I have created an instance in Google Compute Engine.Using SSH I am able to do operations with mongo db.Now I want the collections in mongo-db to be stored in Google Cloud Storage.How can I do that?I have searched for it but no help.
Google Cloud Storage is not suitable for applications requiring a filesystem, such as MongoDB -- the various "filesystem simulations" on top of GCS introduce just too much overhead to be deemed acceptable (at least by my personal standards).
Rather, to use MongoDB on GCE, use persistent disks on your GCE instances, either regular ones (cheaper) or SSD ones (costlier, faster).

Horizontal Scalability and disk storage

I am trying to understand how horizontal scalability (virtualization) is working in terms of disk storage.
virtualization is a layer upon the hardware computer nodes and manage the needed resources for the requests.
So my question is what happens when I deploy my war into the web server for example ? I mean I have a replicated storage in different nodes?
After I did some researches I found NAS vs SAN. so i expect to have SAN replication for data stability .... that is true ?
Where is my storage disk exactly when I have a horizontal based server like Google Engine or AWS?
Thanks,
Hopefully a couple of these examples will help. Let's take a general, crude example. I'll try to keep information simple to understand. Let's say I have a business running on LAMP stack. Apache+PHP is running on WEB1 server, MySQL on DB1 server. Customer data sits on WEB1.
SAN replication
First - your question about replication. That's mostly for disaster recovery. For data stability/reliability, SAN have appropriate RAID levels, service level agreements and spare disks. For example RAID5 tolerates failure of 1 disk in a raid-set. RAID6 tolerates failure of 2 disks in a raid-set etc. Having hot-spare disks help in quick repopulation of failed RAID disk. Organizations also snapshot their disk volumes and replay them in a different data-center so as to have a 2nd copy of their data. This is done over and above regular backups and VM snapshots.
AWS disks
There are 2 types of disks AWS has:
Ephemeral: disks connected to EC2 instance
Elastic Block Storage (EBS)
Ephemeral storage
Don't use this for anything critical. AWS offers EC2 instances with ephemeral storage (that means, VM has disks attached to the server) and recommends that users purchase slice of their disk in the form of EBS (Elastic Block Storage). I'd chose to not run anything on ephemeral storage because if EC2 instance stops, information on ephemeral storage is gone! However, if my partitions were on EBS volume, EC2 restart will be seamless. All data will stay alive on my EBS volume.
EBS
When I want a VM, I'd choose an EC2 instance (CPU/Memory). Then I buy disk in the form of EBS volume of 100GB (or more if I want to do RAID/LVM etc.) and attach it to my EC2 instance. Now I can install OS on my EC2 volume. Partitions are all created on my EBS volume. When EC2 reboots, my data stays as-is.
Disk scaling
Let's say I began my business with an EC2 instance + 100GB of EBS volume. All's well until my customers began to upload really large files. My disk is getting full and I need to expand a partition. With AWS, I could buy another slice of 100GB of EBS volume and expand my partition to use this additional 100GB.
Server scaling
Let's say my business is doing really well and my EC2 instance isn't keeping up with traffic. I need more horse-power and I choose to add another server WEB2 running Apache+PHP server with its own EBS volume. But what about customer data? Will I store some data on WEB1 and some on WEB2? That'd be hard to reconcile.
Keeping code same on WEB1 and WEB2
Code from Git (or version control of choice) will be deployed to both WEB1 and WEB2 simultaneously. That will keeps both my server's code up to date. Configuration management of my servers can happen through Ansible/Puppet/Chef.
Streamlining data storage
I have some options. Let's discuss two options that will allow WEB1 and WEB2 to share data/disk space. Important note - EBS volume cannot be shared with multiple EC2 instances. EBS volume can be attached to only one EC2 instance.
First option - stand up another server DATA1 and attach a large EBS volume to it and move customer files there. WEB1 and WEB2 will send customer data to DATA1 (rsync/ftp/scp). WEB1 and WEB2 will read/write from DB1 database also. I could even safeguard my data by taking snapshots of EBS volume and replaying the snapshot on another server called DATA2 in a different AWS region or availability zone in case DATA1 is unavailable.
Second option - AWS has S3 storage. It's reliable and cheaper than EBS. Instead of standing up DATA1 and DATA2, it is much easier and cheaper to create a bucket on S3 and store customer data there. WEB1 and WEB2 can read/write to S3 seamlessly.
Where're my disks on AWS?
I don't know, and I don't need to know. AWS must have racks and racks of disks. I am getting a slice of disk space from somewhere there. Their disks are likely to have redundancy but EBS failures are possible. For our own sanity, it is good to RAID and snapshot EBS volumes over and above taking regular backups.
Similar to disks, AWS must have racks and racks of servers. I am getting a virtual machine in the form of EC2 instance of my choice from somewhere in those racks. When I shutdown and restart EC2 server, I may get the same specification VM from a different rack. However, when my EBS volume will remain the same unless I terminate EBS volume and buy a new EBS volume.
One thing to recognize is that if I bought EC2 instance in Oregon, my EBS volume will be in the same Oregon region and also the same availability zone.
Note - this is a very generic answer.