AWS (or other cloud solution) for migrating large data? - mongodb

I have organized, non-relational data that is in both file system and SQL database. There is application that queries both sources.
What would be some cloud solutions for storing this data, which equates to about 1TB? I'd like to be able to migrate this data into the cloud solution and alter the application to query the data in the cloud.
So far, I've looked at AWS options: SimpleDB, DynamoDB, and MongoDB on an EC2 Intance with EBS for increased storage.
I've also looked into Azure's Table Storage.
SimpleDB has a 10GB limit. DynamoDB is on SSD and might be overkill for my needs. Did I miss something? Are MongoDB on AWS or Azure Table storage suitable options?

I think the solution depends heavily on your data access patterns.
I've used Azure Table Storage and it's great for many things. I've used DynamoDB and it's also good for quite a few things. Both are good table stores, but both have restrictions around read indexes, querying, and transactions. That's sometimes a show stopper. Both will require retooling your data and all the dependent applications.
For your file storage:
(Cheapest, slowest) Migrate your files to a blob store (Azure Blob Storage or AWS S3) and leave them there. Use S3 as a drive for file access. This is slow, but cheap.
(Performant) Use an EC2 instance with EBS drives and store your files there. Access the data on the local file system. This is durable and performant.
For your relational data, leave it relational and store it in a Cloud relational database server. (RDS+MySQL, RDS+SQL Server, SQL Azure, etc).
There's no need to change your applications, and their data patterns, moving to the cloud.

Related

AWS platform. Picking the right technologies

I am building an app that allows people to share items with other people in the community. I wanted to use AWS as my platform.
My idea was to use react Native for the app. AWS Cognito for the authentication. AWS lambda for the server calls. Relational database for storing data about the items and user data such as geolocation. Dynamodb for real-time chat, requests for borrowing and transaction data between users. My primary focus is low cost and I was thinking of using PostgresSQL for relational database.
What do you guys think of my database choices. Of course the PostgresSQL database on rds. Is there a flaw in database plan so far? Any help would be greatly appreciated.
I would probably just use DynamoDB for everything in your application. I don't see a real need to storing some of your data in an RDS database here. However if you definitely need a relational database, I would suggest AWS Aurora Serverless so that your entire application would be using serverless AWS services. Also, normal relational database connection pools don't work that well in AWS Lambda, so I would suggest using the new Data API.

increase number of sql database backups in google app engine

I'm wondering whether it's possible to make GAE create more than 7 sql database backups and how much that will cost. They don't seem to mention this possibility in their document.
I'm afraid it's not possible to have more than 7 backups at a time for Cloud SQL. The goal is to use the backups as a last line of defense for restoring a broken database. If you need snapshots over time, you can use the Export functionality to save dumps to a Cloud Storage bucket.

NoSQL as local storage for logging and tracing

We are developing application which will have many physical servers. We want to use NoSQL for logging and tracing since it does not required structured data.
We don't want to have Centralized logging.
Can we install NoSQL (any one) in each server and store logging/tracing details? Will NoSQL impact my actually process in the server? Is it good idea to do it?
Problem1: Data Collection
Many people're using NoSQL solutions for storing application logs. The first challenge you may have is how to collect huge amount of data from various data sources reliably with ease of management. One concern of not having log collection layer, is lock contention of database caused by high write throughput.
So basically having log collection layer is recommended. There're some open-source log collector implementation such as syslog, Fluentd, Scribe, and Flume :)
Problem2: Storage & Processing
The next big problem is how to store and process data. The backend infrastructure requires a lot of changes as the data volume increase. At first, you can use MongoDB to store all of your data, but at some moment you end up using Apache Hadoop to architect a massively scalable architecture.
Here's an example architecture of having Fluentd for log collection, and MongoDB for log storage and processing.
Here're some links to put the Apache Logs into Amazon S3, MongoDB, or Hadoop HDFS by Fluentd.
Store Apache Logs into Amazon S3
Store Apache Logs into MongoDB
Fluentd + HDFS: Instant Big Data Collection
Disclaimer: I'm a committer of Fluentd project.
definitely this is good idea for doing same thing with nosql rather than sql.
because in logging and tracing volume of data is high and ratio of retrieving data is also high.
you for logging and tracing you need complex reports for analysis so nosql is better for you.
also nosql support distributed environment so you create infrastructure at different geographic location.

Azure Table Vs MongoDB on Azure

I want to use a NoSQL database on Windows Azure and the data volume will be very large. Whether a Azure Table storage or a MongoDB database running using a Worker role can offer better performance and scalability? Has anyone used MongoDB on Azure using a Worker role? Please share your thoughts on using MongoDB on Azure over the Azure table storage.
Table Storage is a core Windows Azure storage feature, designed to be scalable (100TB 200TB 500TB per account), durable (triple-replicated in the data center, optionally georeplicated to another data center), and schemaless (each row may contain any properties you want). A row is located by partition key + row key, providing very fast lookup. All Table Storage access is via a well-defined REST API usable through any language (with SDKs, built on top of the REST APIs, already in place for .NET, PHP, Java, Python & Ruby).
MongoDB is a document-oriented database. To run it in Azure, you need to install MongoDB onto a web/worker roles or Virtual Machine, point it to a cloud drive (thereby providing a drive letter) or attached disk (for Windows/Linux Virtual Machines), optionally turn on journaling (which I'd recommend), and optionally define an external endpoint for your use (or access it via virtual network). The Cloud Drive / attached disk, by the way, is actually stored in an Azure Blob, giving you the same durability and georeplication as Azure Tables.
When comparing the two, remember that Table Storage is Storage-as-a-Service: you simply access a well-known REST endpoint. With MongoDB, you're responsible for maintaining the database (e.g. whenever MongoDB Inc (formerly 10gen) pushes out a new version of MongoDB, you'll need to update your server accordingly).
Regarding MongoDB Inc's alpha version pointed to by jtoberon: If you take a close look at it, you'll see a few key things:
The setup is for a Standalone mongodb instance, without replica-sets or shards. Regarding replica-sets, you still get several benefits using the Standalone version, due to the way Blob storage works.
To provide high-availability, you can run with multiple instances. In this case, only one instance serves the database, and one is a 'warm-standby' that launches the mongod process as soon as the other instance fails (for maintenance reboot, hardware failure, etc.).
While 10gen's Windows Azure wrapper is still considered 'alpha,' mongod.exe is not. You can launch the mongod exe just like you'd launch any other Windows exe. It's just the management code around the launching, and that's what the alpa implementation is demonstrating.
EDIT 2011-12-8: This is no longer in an alpha state. You can download the latest MongoDB+Windows Azure project here, which provides replica-set support.
For performance, I think you'll need to do some benchmarking. Having said that, consider the following:
When accessing either Table Storage or MongoDB from, say, a Web Role, you're still reaching out to the Windows Azure Storage system.
MongoDB uses lots of memory for its own cache. For this reason, lots of high-scale MongoDB systems are deployed to larger instance sizes. For Table Storage access, you won't have the same memory-size consideration.
EDIT April 7, 2015
If you want to use a document-based database as-a-service, Azure now offers DocumentDB.
I have used both.
Azure Tables : dead simple, fast, really hard to write even simple queries.
Mongo : runs nicely, lots of querying capabilities, requires several instances to be reliable.
In a nutshell,
if your queries are really simple (key->value), you must run a cost comparison (mainly number of transactions against the storage versus cost of hosting Mongo on Azure). I would rather go to table storage for that one.
If you need more elaborate queries and don't want to go to SQL Azure, Mongo is likely your best bet.
I realize that this question is dated. I'd like to add the following info for those who may come upon this question in their searches.
Note that now, MongoDB is offered as a fully managed service on Azure. (officially in Beta as of Apr '15)
See:
http://www.mongodb.com/partners/cloud/microsoft
or
https://azure.microsoft.com/en-us/blog/announcing-new-mongodb-instances-on-microsoft-azure/
See (including pricing):
https://azure.microsoft.com/en-us/marketplace/partners/mongolab/mongolab/
My first choice is AzureTables because SAAS model and low cost and SLA 99.99% http://alexandrebrisebois.wordpress.com/2013/07/09/what-if-20000-windows-azure-storage-transactions-per-second-isnt-enough/
some limits..
http://msdn.microsoft.com/en-us/library/windowsazure/jj553018.aspx
http://www.windowsazure.com/en-us/pricing/calculator/?scenario=data-management
or AzureSQL for small business
DocumentDB
http://azure.microsoft.com/en-us/documentation/services/documentdb/
http://azure.microsoft.com/en-us/documentation/articles/documentdb-limits/
second choice is many cloud providers including Amazon offer S3
or Google tables https://developers.google.com/bigquery/pricing
nTH choice manage the SHOW all by myself have no sleep MongoDB well I will look again the first two SAAS
My choice if I am running "CLOUD" I will go for SAAS model as much as possible "RENT-IT"...
The question is what my app needs is it AzureTables or DocumentDB or AzureSQL
DocumentDB documentation
http://azure.microsoft.com/en-us/documentation/services/documentdb/
How Azure pricing works
http://azure.microsoft.com/en-us/pricing/details/documentdb/
this is fun
http://www.documentdb.com/sql/demo
At Build 2016 it was announced that DocumentDB would support all MongoDB drivers. This solves some of the lack of tooling issues with DocDB and also makes it easier to migrate Mongo apps.
Above answers are all good - but the real answer depends on what your requirements are. You need to understand what size of data you are processing, what types of operations you want to perform on the data and then select the solution that meets your needs.
One thing to remember is Azure Table Storage doesn't support complex data types.It supports every property in entity to be a String or number or boolean or date etc.
One can't store an object against a key,which i feel is must for NoSql DB.
https://learn.microsoft.com/en-us/rest/api/storageservices/fileservices/understanding-the-table-service-data-model scroll to Property Types

Data Access Layer - Switching from Local SQL Database to Cloud Data Storage

I am creating a simple application and getting stuck with data storage option. To begin with I would like to use SQL Server as my data storage. I will not be using any special features of SQL Server, its pure tables with CRUD operations.
Now I should be able to switch the underlying data store to either SQL Data Services or Amazon S3 by changing few configuration parameters.
Is this possible??? If yes, can anyone provide high level guidance on how to go about it? Do I need to use Entity Framework to begin with SQL Server? Does Entity Framework supports SQL Data Services? Any common component which supports both SQL Data Services and Amazon S3?
Too many questions!!!
Thanks for the help in advance.
The closest ORM I know of is LightSpeed. I've never used it though. Personally, if the end goal is to use cloud storage, I'd probably just use cloud storage from the get go...
If you are going to use Amazon's SimpleDB, M/Gateway has a open source db that mimics their API.