creating a different database for each collection in MongoDB 2.2 - mongodb

MongoDB 2.2 has a write lock per database as opposed to a global write lock on the server in previous versions. So would it be ok if i store each collection in a separate database to effectively have a write lock per collection.(This will make it look like MyISAM's table level locking). Is this approach faulty?

There's a key limitation to the locking and that is the local database. That database includes a the oplog collection which is used for replication.
If you're running in production, you should be running with Replica Sets. If you're running with Replica Sets, you need to be aware of the write lock effect on that database.
Breaking out your 10 collections into 10 DBs is useless if they all block waiting for the oplog.
Before taking a large step to re-write, please ensure that the oplog will not cause issues.
Also, be aware that MongoDB implements DB-level security. If you're using any security features, you are now creating more DBs to secure.

Yes that will work, 10gen actually offers this as an option in their talks on locking.
I probably isolate every collection, though. Most databases seem to have 2-5 high activity collections. For the sake of simplicity it's probably better to keep the low activity collections grouped in one DB and put high activity collections in their own databases.

Related

Is there any performance impact to have multiple Mongo databases?

We are currently working on an application using Mongo and we try to evaluate benefits and constraints on each differents architecture choices related to spreading data on multiple databases/collections or using a single shared one.
Is there any performance penalties between one single database with a lot of collections or many databases with less collections per database ?
From what I understand it does not seem to have any impact because sharding is done per collection basis but I would like some confirmations.
Regards
By performance, I guess you mean read/write speed. Using multiple databases with fewer collections would definitely increase your read/write speed since each database can handle more read/write operations on the collections associated with them. 
However, spreading data across databases this way I believe can bring about extra complexity to your project, depending on how your codebase is structured, it might introduce complexity to your application logic, things like backup and other admin database operations won't be straight forward, cross collection ad-hoc queries for collection that lives in different databases would be next to impossible.
If the goal of the architecture design is to ensure high read/write speed, you can still go with using a single DB that can be auto-scaled at the deployment level. I don't know much about it but I think Replication is a MongoDB feature that can help you achieve such auto-scaling and if you are in for database-as-a-service, you should check out MongoDB Atlas, auto-scaling is out of the box.

Should I use different databases or just different collections in MongoDB to store user information and rest of the database?

I am pretty new to MongoDB. I am creating an application where I will have users and a lot of other data.I have already created a database where I am storing user information using MongoDB. Now I have to create a new database or collection to store rest of the data. What are the pros and cons of creating different or different collection ?
I use MongoDB in a very similar way and have already thought a lot about dividing my database. Here are some of the things we considered:
Using 2 databases is harder to maintain, your application will have to know which database to update, also it can increase the costs (even more if you intend to monitor the databases and host on different infrastructure).
Mongo 2 used to lock the entire database when updating, so I think it would be better to separate then, but Mongo 3 with WiredTiger locks only the document, so you won't have the problems we used to have in the past.
One good thing about splitting the database in two is that even if your data explodes one database, the other will still work
IMHO, if you use a decent machine to store your databases and monitor it the right way, you won't have any troubles keeping just one until your system is giant with millions of active users. You can also use Replica Sets and Sharding to increase efficiency.

does mongodb have the properties such as trigger and procedure in a relational database?

as the title suggests, include out the map-reduce framework
if i want to trigger an event to run a consistency check or security operations before a record is inserted, how can i do that with MongoDB?
MongoDB does not support triggers, but people have created solutions around them, mostly using the oplog, though this will only help you if you are running with replica sets, as the oplog is a capped collection that keeps track of data changes for the purposes of replication.
For a nodejs solution see: https://www.npmjs.org/package/mongo-watch or see an earlier SO thread: How to listen for changes to a MongoDB collection?
If you are concerned with consistency, read about write concern in mongoDB. http://docs.mongodb.org/manual/core/write-concern/ You can be as relaxed or as strict as you want by setting insert write concern levels, from fire and hope to getting an acknowledgement from all members of the replica set.
So, if you want to run a consistency check before inserting data, you probably will have to move that logic to the client application and set your write concern level to a level that will ensure consistency.
MongoDb does not have triggers or stored procedures. While there are solutions that some have used to try to emulate the behavior, as it is not a built-in feature, you'll need to decide whether the solutions are effective for you. Searching for "triggers and mongodb" should find dozens. All depend on the oplog and replicas.
But, given the nature of MongoDb and a typical 3 tier architecture, I would expect that at the point of data insertion, which could be on a web server for example, you would run, on the web server, the necessary consistency and security checks. You wouldn't allow a client such as a mobile application to directly set data into the database collection without some checks.
Many drivers for MongoDb and extended libraries have validation and consistency checks built in already, so there is less to do. Using unique indexes for some fields can also provide a level of consistency that you cannot do from the driver alone. Look at calls like findAndModify which make atomic updates.

Using a Database vs a Collection in MongoDB

I am building a site with users who have discussions and write blogs and plan to use MongoDB as the database for the site. Which architecture option would be more efficient and allow for easier data flow between them:
One Database with a Blogs Collection, a Discussions Collection, and a User Activity Collection? Each collection would be sharded as appropriate.
A Blogs Database, a Discussions Database, and a User Activity Database? Each database would be broken into collections and sha rded as appropriate.
It won't make a big difference whether you put everything into a single database or into multiple databases until you find you need to do something that's handled on the database level, for example access control, or placing database files on separate physical devices (to reduce I/O contention).
In addition, currently locking granularity is on the database level so if you happen to have a very large number of small writes having them go to different databases will mean that they will not be contending for the same lock. Since you anticipate sharding you can also place each database on a different shard which may allow you to defer actually needing to shard any particular collection as each shard would only be handling the traffic for that database's collection(s).
I would say if you are in doubt go ahead and put them in separate databases, it's unlikely to hurt and it may help.
Mongo will work, but getting familiar with it may take time depending on your experience.
If you use MySQL (or another SQL db) you may have an easier time. You should probably just create separate tables for your blogs, discussions, and activity, rather than multiple databases.
Another factor to consider is the size of your databases. An SQL database is fine for most applications, even fairly large ones. MongoDB (and other NoSQL db's) are great for scaling big data.
Hope this helps!

Configure a Mongo replica set to only replicate certain collections

I have a ~3GB mongo database with several dozen collections. Three of these collections handle ~300 queries per second, while the rest sustain a much lower volume. I expect the traffic to continue to grow quickly.
I'd like to set up a replica set to handle the high-traffic collections. It isn't necessary for this new instance to replicate the rest of the database. Is this possible?
Seems like not possible at the moment by built-in features of mongodb and only way to do is to come up with your own manual replication algorithm or use some other tools written by third parties.
https://github.com/wordnik/wordnik-oss project might help you to achieve this according to the following post.
https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/Ap9V4ArGuFo
Describes workaround to filter documents in replication.
Replicate only documents where {'public':true} in MongoDB
Or just replicate the data yourself manually which might worth trying.
Good luck.
No that isn't possible now. What you could do is move those collections into another unreplicated database. But this will cause headaches once these collections see higher traffic too, so you would need to move them into your "replication"-db.
But in general Replication isn't the way to go if you need to scale, it's more considered for DR/failover. Replicaset Secondaries can only (optionally) answer read queries but no write queries, this is something you should keep in mind. So if you have high write load this may not cure your problem.
Once you allow your application to read from secondaries you need to live with eventual consistency, meaning that your application isn't guaranteed to see always the latest data. This is caused due to the asynchronous replication to the secondaries.
Indeed you can cure this problem if you configure your writeconcern, so that the write needs to succeeded on all replicas, before it's considered written and your driver returns. But this may slow down your write operations significant.
So for scaling query execution capabilities I would go with Sharding. This is possible on a per collection level, all unsharded collections will remain on a "default-shard".
Not possible but then if the data size is so small and these collections aren't updated, then the only overhead of having them replicated is the small storage size on the secondary. That is a relatively small price to pay, especially since the collections won't grow in size, compared with writing your own replication logic.
Instead of that archive the data, and have only the latest data set on the production server and the rest of the data can archive on the new server.