Cosmos or MongoDB On Azure - mongodb

I will start to a new .Core + NoSQL project. I am free to choose to use MongoDB or Cosmos on Azure for the database of at most 10 GBs.
That is, if I use Cosmos, I will have no maintenence issues but accesing it with a MongoDB driver seems like containing potential issues. I also have no experience with Cosmos while I worked with MongoDB previously. On the other hand, if I setup Mongo on a Windows or a Linux Server, I have to take care of the server itself, follow up the disk space, fix potencial issues etc.
In terms of maintenence and reliability, which one do you suggest?

As a rule of thumb, always choose the most managed service unless you have a reason not to. You probably answered your own question, in terms of maintenance and reliability you should choose Database-as-a-Service (CosmosDB) which not only offers a 99.999% high availability SLA but enables you to grow and distribute globally.
There is a MongoDB API for Cosmos, I would give it a try and implement a PoC.

Related

Horizontally Scaling Database Guide

We want to horizontally scale our existing MongoDB database which is running on one server. Due to increased user base, we can't scale it vertically anymore. We need to scale it horizontally through sharding.
The MongoDB provides a good tutorial to achieve Sharding. But, we need to do it in less amount of time. We are not expert on this.
It seems there are multiple options available like Google Cloud and Amazon RDS. All we want is to use our database but achieve Sharding by some another service.
So my questions are:
1. Is it possible to build a fail-safe cluster architecture is less than a week using MongoDB Sharding with the team having no prior experience in this?
2. If not, do these services like Google cloud SQL and Amazon RDS provide a mechanism to use our database with their Sharding service?
Can anyone with expertise in this just guide me in this direction?
I tried MongoDB Atlas and it looks pretty good https://www.mongodb.com/cloud/atlas
It creates a cluster for you by default
Maybe, you can give it a try:
MongoDB Atlas delivers the world’s leading database for modern
applications as a fully automated cloud service engineered and run by
the same team that builds the database. Proven operational and
security practices are built in, automating time-consuming
administration tasks such as infrastructure provisioning, database
setup, ensuring availability, global distribution, backups, and more.
The easy-to-use UI and API let you spend more time building your
applications and less time managing your database.

Is it required to shard a MongoDB cluster to make it available worldwide?

Our entire database replica set is located in Europe.
We started getting some Asian clients and they are complaining about slowness when using our web application. We've deployed a set of app servers in Asia but they are still complaining about performance - so we realized the problem could be related to the databases being in Europe.
We've ran some tests using a single replica in asia, the reads were fast but the writes were still very slow.
It's impossible to make a master-master replication with MongoDB, so we're looking to shard our database.
But is this the only way to make the database fully available in different regions?
Cheers
Yes (most probably)
Here is the link to support this thought: https://docs.mongodb.com/manual/sharding/#advantages-of-sharding
I didn't find any other solution in official documentation regarding your use case.

MongoDB each each cluster is on a different server or that they all in one

I am starting to use MongoDB and yet I am developing the first project with this. I can not to predict how volume of clients and usage it gonna to receive but I want to develop it from the beginning to be high volume handled.
I have heard about clusters and I saw the demonstrations in MongoDB official website.
And here is my question (cutted to small semi-questions):
Are clusters are different servers or that they are just pieces of one big server?
Maybe it seems a bit not related, but how Facebook or huge database handles its data across countries? I mean, they have users from Asia and from America. Surely with different servers, how the system knows how to host, aggregate and deliver with the right server? Is it automatically or that it is a tool that a third party supply to such large databases?
If I am using clusters, shall I still just insert the data to the database and the Mongo will manipulate them in cluster by it's own, or shall I do that manually?
I have a cloud VPS. Should I continue work with this for Mongo or maybe I should really consider about AWS / Google Cloud Platform / etc..?
And another important thing is: Im from Israel, and the clouds I have mentioned above are probably from Europe at least or even more far.
It will probably cause in high latency, is not it?
Thanks.

Cassandra vs Mongodb running costs?

We are planning to create a public website, and we're in the process of choosing suitable Database for it. After discussions it was suggested to go with NOSQL databases as it would be easier for scaling in future.
In our website we expect regular writes and lot of reads, and it seems either Cassandra or MongoDB would best suit for it.
Kindly suggest between Cassandra and MongoDb which database would be easier on hosting and maintenance and cheaper on hosting charges.
Also please suggest some providers for better and low cost hosting for both cassandra and MongoDb.

What are the pros and cons of DynamoDB with respect to other NoSQL databases?

We use MongoDB database add-on on Heroku for our SaaS product. Now that Amazon launched DynamoDB, a cloud database service, I was wondering how that changes the NoSQL offerings landscape?
Specifically for cloud based services or SaaS vendors, how will using DynamoDB be better or worse as compared to say MongoDB? Are there any cost, performance, scalability, reliability, drivers, community etc. benefits of using one versus the other?
For starters, it will be fully managed by Amazon's expert team, so you can bet that it will scale very well with virtually no input from the end user (developer).
Also, since its built and managed by Amazon, you can assume that they have designed it to work very well with their infrastructure so you can can assume that performance will be top notch. In addition to being specifically built for their infrastructure, they have chosen to use SSD's as storage so right from the start, disk throughput will be significantly higher than other data stores on AWS that are HDD backed.
I havent seen any drivers yet and I think its too early to tell how the community will react to this, but I suspect that Amazon will have drivers for all of the most popular languages and the community will likely receive this well - and in turn create additional drivers and tools.
Using MongoDB through an add-on for Heroku effectively turns MongoDB into a SaaS product as well.
In reality one would be comparing whatever service a chosen provider has compared to what Amazon can offer instead of comparing one persistance solution to another.
This is very hard to do. Each provider will have varying levels of service at different price points and one could consider the option of running it on their own hardware locally for development purposes a welcome option.
I think the key difference to consider is MongoDB is a software that you can install anywhere (including at AWS or at other cloud service or in-house) where as DynamoDB is a SaaS available exclusively as hosted service from Amazon (AWS). If you want to retain the option of hosting your application in-house, DynamoDB is not an option. If hosting outside of AWS is not a consideration, then, DynamoDB should be your default choice unless very specific features are of higher consideration.
There's a table in the following link that summarizes the attributes of DynamoDB and Cassandra:
http://www.datastax.com/dev/blog/amazon-dynamodb
Something that needs improvement on DynamoDB in order to become more usable is the possibility to index columns other than the primary key.
UPDATE 1 (06/04/2013)
On 04/18/2013, Amazon announced support for Local Secondary Indexes, which made DynamoDB f***ing great:
http://aws.amazon.com/about-aws/whats-new/2013/04/18/amazon-dynamodb-announces-local-secondary-indexes/
I have to be honest; I was very excited when I heard about the new DynamoDB and did attend the webinar yesterday. However it's so difficult to make a decision right now as everything they said was still very vague; I have no idea the functions that are going to be allowed / used through their service.
The one thing I do know is that scaling is automatically handled; which is pretty awesome, yet there are still so many unknowns that it's tough to really make a great analysis until all the facts are in and we can start using it.
Thus far I still see mongo as working much better for me (personally) in the project undertaking that I've been working on.
Like most DB decisions, it's really going to come down to a project by project decision of what's best for your need.
I anxiously await more information on the product, as for now though it is in beta and I wouldn't jump ship to adopt the latest and greatest only to be a tester :)
I think one of the key differences between DynamoDB and other NoSQL offerings is the provisioned throughput - you pay for a specific throughput level on a table and provided you keep your data well-partitioned you can always expect that throughput to be met. So as your application load grows you can scale up and keep you performance more-or-less constant.
Amazon DynamoDB seems like a pretty decent NoSQL solution. It is fast, and it is pretty easy to use. Other than having an AWS account, there really isn't any setup or maintenance required. The feature set and API is fairly small right now compared to MongoDB/CouchDB/Cassandra, but I would probably expect that to grow over time as feedback from the developer community is received. Right now, all of the official AWS SDKs include a DynamoDB client.
Pros
Lightning Fast (uses SSDs internally)
Really (really) reliable. (chances of write failures are lower)
Seamless scaling (no need to do manual sharding)
Works as webservices (no server, no configuration, no installation)
Easily integrated with other AWS features (can store the whole table into S3 or use EMR etc)
Replication is managed internally, so chances of accidental loss of data is negligible.
Cons
Very (very) limited querying.
Scanning is painful (I remember once a scanning through Java ran for 6 hours)
pre-defined throughput, which means sudden increase beyond the set throughput will be throttled.
throughput is partitioned as table is sharded internally. (which means if you had a throughput for 1000 and its partitioned in two and if you are reading only the latest data(from one part) then your throughput of reading is 500 only)
No joins, Limited indexing allowed (basically 2).
No views, triggers, scripts or stored procedure.
It's really good as an alternative to session storage in scalable application. Another good use would be logging/auditing in extensive system. NOT preferable for feature rich application with frequent enhancement or changes.