MongoDB Replica Sets - mongodb

I am new to MongoDB and have little experience at the moment so need a little help, we are looking at setting up MongoDB with a standard replica set. This contains, as I understand it, a primary and two secondary. My question is this, will the primary, and two secondary definitely require a different server or VM for example (I have read this is the case but still not sure) as we will be performing a fair number of writes each time a user logs into the system.
Currently we are just looking into the feasibility of this set up at the moment and nothing has been decided yet.
Thanks in advance.

Related

Oplog tailing in Meteor - to do it or not to do it?

I am trying to reconcile this kadira.io article which says that oplog tailing is a must for every Meteor production app with this compose.io article - see section "To Oplog or not Oplog" which says you should only use oplog in certain circumstances.
Basically I have a Meteor app which does not have a high volume of users or a massive amount of continuous writing to collections.
It does however need to read a lot of data from the DB which seems to be slowing things down.
As far as I know it is only running on one server.
I am wondering will adding oplog tailing speed things up?
Thanks in advance.
Basically no matter if you do it, replica set is always doing it to keep all nodes in sync. Usually if your system is not write heavy, the tailing shouldn't be an issue because with replication working, the latest oplog should be in memory. What causes stress is usually the first round when the program tries to find where to tail from. With no index, it has to be a COLLSCAN. Other than that there's no need to worry. But it's a one time thing so as long as you know what's going on, it should be fine.
Back to your question. Yes it's running on one server. Which one depends on your readPreference and replica set tag if any. And after the first time finding the tail point, it shouldn't be a problem.

mongodb connection handling for multi-tenant architecture

We are currently designing a SaaS application that has a subscriber/user based mode of operation. For example, a single subscriber can have 5, 10 or up to 25 users in their account dependent on what type of package they are on.
At the moment we are going with a single database per tenant approach. This has several advantages for us from the standpoint of the application.
I have read about the connection limits associated with Mongo and I am a little confused and worried. I was hoping someone could clarify it for me in simple terms, as I haven't worked much with Mongo.
From what I understand, there is a hard limit of 20,000 connections available on the mongod process and the mongos processes.
How does that translate to this multi-tenant approach? I am trying to basically asses how I would deploy the application in general in terms of replica sets and if sharding is necessary such that I don't hit these limits. How does one handle such a scenario for example if you have 10,000 tenants with multiple users that would exceed the limit.
Our application generally wouldn't need sharding as the each tenants collection wouldn't reach the point where it would need to be sharded. From what I understand though, MongoDB will create databases in a round robin fashion on each shard and will distribute the load which may help with the connection issues.
This is me just trying to make sense of what I've read and I'm hoping someone can clear this up for me.
Thanks in advance
EDIT
If I just add replica sets, will this alleviate this problem? Even though only the primary can accept writes from what I understand?
You just have to store a database connection in a pool and reuse it if you access the same database again. This will limit the number of connections to a reasonable number unless you aren't using 10,000s of databases which wouldn't be a good idea anyway.

MongoDB Fail Over with one address

I would like to know if it is at all possible to have mongodb fail overs only using a single address. I know replica sets are typically used for this while relying on the driver to make the switch over, but I was hoping there may be a solution out there that would allow one address or hostname to automatically change over when the mongodb instance was recognized as being down.
Any such luck? I know there are solutions for MySQL, but I haven't had much luck with finding something for MongoDB.
Thanks!
Yes it is possible, the driver holds a cache map of your replica set which it will query for a new primary when the set suffers an election. This map is refreshed once every so often however, if your application restarts (process is quit or something, or each request of PHP fork mode) then the driver has no choice but to refresh its map. At this point you will suffer connectivity problems.
Of course the best thing to do is to add a seedlist.
Using a single IP defies the redundancy that is in-built into MongoDB.

MongoDb:Is there better way to copy production environment data to testing environment

Now I have two shards:shard3(16g),shard4(15g) and three machines:
the deploy like this:
10.10.10.5:(mongoS,configureserver,shard3 primary,shard4 primary)
10.10.10.6:(mongoS,configureserver,shard3 secondary,shard4 secondary)
10.10.10.7:(mongoS,configureserver,shard3 arbitor,shard4 arbitor)
now I want to make a performance testing(about adding new shards),I know I can't use the production environment to test since that will impact the production performance,So I want to copy all of the data to my testing three machines:20.20.20.5,20.20.20.6 and 20.20.20.7,I read the manual book but can't find a better way,So dear guys could you please give me an advice.
by the way,ask two little questions:First:like my production environment,how do I change the arbitor node to secondary,i.e I want change 10.10.10.7 to secondary shards,because I wanna 10.7 share the read pressure with 10.6.
Second:how do I indicate the Mongos read primary node but secondary node,you know the mongoS writes on primary while read on secondary,but I wanna both read and write on primary node for getting the newest data immediately.
Thanks inadvance
Jack
You should look at the following documentation: http://www.mongodb.org/display/DOCS/Import+Export+Tools. You can likely use mongoexport and mongoimport for what you want to do. Or you can also use mongodump & mongorestore. This will allow you to backup and restore your data onto the testing environment.
First question: You Can't "convert" an arbiter into a secondary. The only real way to do this would be to add a new node to the replica-set, and then take down the arbiter and later remove it from the replica-set. You can add a new mongod to the replica-set using rs.add() on an existing replica set node. You do it in this order to avoid downtime. If you don't mind downtime, then order doesn't particularly matter. Documentation on adding a node to a replica set can be found here: http://www.mongodb.org/display/DOCS/Replica+Sets+-+Basics
Also, if you are doing readScaling and using SlaveOK() it's important you keep in mind that if there is replication lag from the primary to either of the secondaries, there is the potential for you to read stale data. If this is acceptable for your application than its fine, but it's important for you to realize that if you simultaneously query two nodes in a replica-set, you may read in two different values for the same query based on replication lag.
Second Question: If you want to always only read/write from the primary than you should not run with SlaveOK. SlaveOK off is the default, but if you already turned it on just call it again passing in false.
As long as you have a primary up and a majority of your replica set, the replica set will be writable. So naturally, as long as the primary is not overloaded and you have not taken down 2 out of 4 of your nodes, you will be able to write to the primary. It just will only replicate to the majority of the nodes that are up.
In general, we try to discourage even numbered replica-sets so I would take the arbiter offline and stick to your 3 replica sets. This is because you really don't win anything from having the arbiter. If 2 nodes fail either way you do not have a majority and the replica set will become read only. If one node goes down you will still be up for writes. The arbiter doesn't help anything in an even number set.
When you tried to perform these queries that went to only one secondary, was the new secondary all caught up datawise? If it was in "Recovery" state, then it's possible that the reads would not have gone to it yet since it had not replicated all of the data. Aside from that, there are ways to specify read preferences.
Documentation on all of the different read preferences and how to use them can be found here: http://docs.mongodb.org/manual/applications/replication/#read-preference

MongoDb vs CouchDb: write speeds for geographically remote clients

I would like all of my users to be able to read and write to the datastore very quickly. It seems like MongoDb has blazing reads, but the writes seem like they could be very very slow if the one master db needs to be located very far away from the client.
Couchdb seems that it has slow reads, but how about the writes in the case when the client is very far away from the master.
With couchdb, we can have multiple masters, meaning we can always have a write node close to the client. Could couchdb actually be faster for writes than mongodb in the case when our user base is spread very far out geographically?
I would love to use mongoDb due to its blazing fast speed, but some of my users very far away from the only master will have a horrible experience.
For worldwide types of systems, wouldn't couchDb be better. Isn't mongodb completely ruled out in the case where you have users all around the world?
MongoDb, if you're listening, why don't you do some simple multi-master setups, where conflict resolution can be part of the update semantic?
This seems to be the only thing standing in between mongoDb completely dominating the nosql marketshare. Everything else is very impressive.
Disclosure: I am a MongoDB fan and user, i have zero experience with CouchDB.
I have a heavy duty app that is very read write intensive. I'd say reads outnumber writes by a factor of around 30:1. The way mongo is designed reads are always going to be much faster than writes the trick (in my experience) is to make your writes so efficient that you can dedicate a higher percentage of your system resources to the writes.
When building a product on top of mongo the key thing to remember is the _id field. This field is automatically generated and added to all of your JSON objects it will look something like 47cc67093475061e3d95369d when you design your queries (Find's) try and query on this field wherever possible as it contains the machine location (and i think also disk location??? - i should check this) where the object lives so when you use a find or update using this field will really speed up your machine. Consider this in the design of your system.
Example:
2 of the clusters in my database are "users" and "posts". A user can create multiple posts. These two collections have to reference each other alot in the implementation of my app.
In each post object i store the _id of the parent user.
In each user object i store an array of all the posts the user has authored.
Now on each user page I can generate a list of all the authored posts without a resource stressful query but with a direct look up of the _id. The bigger the mongo cluster the bigger the difference this is going to make.
If you're at all familiar with oracle's physical location rowids you may understand this concept only in mongo it is much more awesome and powerful.
I was scared last year when we decided to finally ditch MySQL for mongo but I can tell you the following about my experience:
- Data porting is always horrible but it went as well as I could have imagined.
- Mongo is probably the best documented NoSQL DB out there and the Open Source community is fantastic.
- When they say fast and scalable there not kidding, it flies.
- Schema design is very easy and much more natural and orderly than key/value type db's in my opinion.
- The whole system seems designed for minimal user complexity, adding nodes etc is a breeze.
Ok, seriously I swear mongo didn't pay me to write this (I wish) but apologies for the love fest.
Whatever your choice, best of luck.
Here is a detailed article that 10gen has created, and gives examples of when you should choose MongoDB or CouchDB, with reasons as well.
http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB
Edit
The above link was removed, but can be viewed here: http://web.archive.org/web/20120614072025/http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+Couch+DB
Your question as of now, is full with speculation and guessing.
...why can't we opt out of consistency for certain writes, so long as we're sure that the person that wrote the data will be able to read it consistently, whereas others will observe eventual consistency
What if those writes effect other writes? What if those writes would prevent other people from doing stuff. It's hard to tell the possible side effect if since you didn't tell us any specifics.
My main suggestion to you is that you do some testing. Unless you've tested it, speculation about bottle necks is a complete waste of time. You don't need to test it via remote machines, set up some local DBs and add some artificial lag, then run your tests.
This way you can test the different options you've got, see where MongoDB is better, or where CouchDB excels at. Then you can either take one of them and go with the contras, or you can try and tweak your Database Model it self and do more tests.
Nobody here will be able to give you a general solution to your specific problem (well unless you give us all your code and you pay us for working on it :P ) databases aren't easy especially if you need to scale them under certain requirements.
For worldwide types of systems, wouldn't couchDb be better. Isn't mongodb completely ruled out in the case where you have users all around the world?
MongoDB supports sharding. So you don't need a single master. In fact, it looks like you have a ready shard key (region).
MongoDB also supports replica sets along with sharding. So if you need to run in multiple data centers (DCs) you put a master and one of the replicas in the same DC. In fact, they also suggest adding a 3rd node to a separate DC as a hot backup failover.
You will have to drill into the more detailed configuration of MongoDB, but you can definitely control where data is stored and you can prioritize that other replicas in a DC are "next in line" for promotion to Master.
At this point however, you're well into the details of MongoDB and you'll need to dig around and "play" quite a bit. However, you'll need lots of "play time" for any solution that's really going to handle masters across data centers.