Set mongodb read preference at mongo server level - mongodb

I have one PRIMARY instance and one SECONDARY instance of mongodb.
Many clients are using my two instances. Each client has its own read preference to "secondary"
My question is :
Is there a way to configure mongodb to set by default the read preference to "secondary" ?
Thanks
MC

Read preference is a client setting, not a server setting, so no, this is not possible as far as I know. An important feature of MongoDB is that you have very fine-grained control over the queries, i.e. you can use different read preferences and write concerns for each query.
It often makes sense to mix these, because losing a log entry might not be too bad while losing a payment is. Likewise, reading logs from the secondary might be fine, but if you want to coordinate a transaction, it might be safer to use the primary for reading (or you're using a paranoid write concern that requires full replication before considering the write successful).

Related

Transactional guarantee in mongodb

So, I am doing research on MongoDB, in line with upper management decision to embrace open source and to migrate existing product database from SQL Server to MongoDB and revamp the entire thing. Do note that our database should focus on data consistency and transactional guarantee.
And i discover this post: Click here. A summary of the post is as follow:
MongoDB claims to be strongly consistent, but a lot of evidence
recently has shown this not to be the case in certain scenarios (when
network partitioning occurs, which can happen under heavy load). This
means that you can potentially lose records that MongoDB has
acknowledged as "successfully written".
In terms of your application, if you have a need to have transactional
guarantees (meaning if you can't make a durable write, you need the
transaction to fail), you should avoid MongoDB. Example scenarios
where strong consistency and durability are essential include "making
a deposit into a bank account" or "creating a record of birth". To put
it another way, these are scenarios where you would get punched in the
face by your customer if you indicated an operation succeeded and it
didn't.
So, my question is as follow:
1) To what extend does "lost data" still valid in current version of MongoDB?
2) What approach can be take to ensure transactional guarantee in MongoDB?
I am pretty sure that if company like PayPal do use MongoDB, there is certainly a way of overcoming these issue.
The references in that post have been discussed here before (for example, here is one: MongoDB: Does Write Concern guarantee the write on primary plus atleast one secondary ). No need to duplicate your question.
The blog "Aphyr" mostly uses these articles to tout it's own tech (if you read the entire blog you will realise they have their own database which they market). Every database they show loses data except their own.
2) What approach can be take to ensure transactional guarantee in MongoDB?
I agree you should be handling database problems in client code, if not then how is your client side ever going to remain consistent in the event of partitions?
Since you are not Harry Potter (are you?) I will say that you need to check for exceptions thrown in your client code and react to them as needed.
1) To what extend does "lost data" still valid in current version of MongoDB?
As for the bug he mentions in 2.4.3: he fails (as I mention in the linked post) to state the bug reference so again, no comment still.
Besides 2 writes in 6,000? That's less data loss than I have seen in MySQL on a partition! So not too shabby.
I have not noticed such behaviour in my own app and, from small to extremely large sites, I have not noticed anyone reproduce benchmark type scenarios as displayed in that article, I doubt very much you will.
I am pretty sure that if company like PayPal do use MongoDB, there is certainly a way of overcoming these issue.
They would have tight coding to ensure consistency in distributed environments.
Of course, they would start by choosing the right tech for the situation...
Write Concern Reference
Write concern describes the level of acknowledgement requested from MongoDB for write operations to a standalone mongod or to replica sets or to sharded clusters. In sharded clusters, mongos instances will pass the write concern on to the shards.
https://docs.mongodb.org/v3.0/reference/write-concern/

Multiple mongodb servers seen as one and data flow management

For my application I need to move old data periodically from a mongodb server to another one (ie, two distinct servers). I also want to be able to query those data as if they were the same database.
In short terms, I want to be able to see two mongodb instances (on two different servers) as one and be able to control when and where the data is stored.
I read about the concept of sharding and chunks and rapidly saw the moveChunk function which can easily do what I want.
The problem is that it seems to be impossible to configure such architecture in mongodb. Am I missing something here?
Archiving Deleted Documents
For the problem of keeping the deleted documents, you have no option to achieve this with build-in features/mechanisms like sharding or replication. The only way to do it is to handle that case manually, e.g. holding a separate collection for deleted documents, and simply move documents to that collections instead of deleting them.
For your global problem of moving data you have the following two options:
Sharding
Using sharding you will split your data into pieces which will be stored on two (in your case) different servers. In this scenario you can use the moveChunk method as you have mentioned. But this method is very tricky, as for that you will need to disable the built-in automatic balancer to have a full manual control over your chunks. Anyway, this is not recommended by the MongoDB:
Only use the moveChunk in special circumstances such as preparing your sharded cluster for an initial ingestion of data, or a large bulk import operation. In most cases allow the balancer to create and balance chunks in sharded clusters
Besides this will only allow to split data, and finally, to get to your goal, you will end up with one full and one empty shard.
Replication
The replication approach is much more safe and easy to achieve. You can simply configure a replica set and add your second server to that set.
If the data is too big, you can configure your second server as hidden. So that no reads will be performed towards that server, so no inconsistent data will be received. After the data replication is finished, you will have the copy of your data on both servers.
As for using both servers as a single server, if you need to balance the requests between these tow, you can configure your readPreference to secondary, which will assure that all the reads are being sent to the secondary server, and writes by default are done on the primary.
In this case your code will be unaware of what server you are querying. You will just run your client methods, and the rest will be done behind the drivers.
Conclusion
So my advice would be to use the replication approach as more clean, pain-less and safe solution.

Consistency for read from distributed databases

I have a set of databases, distributed across multiple locations in the network and for ex. one client that needs to store some data in that databases.
I need to make sure my data will always be stored.
I can't organize a replica set with sync/async replication as it will make me to connect to one master which is a point of failure, so I send data from the client to all databases I know. Apparently, one database can fail to store, so I am relying on other databases writes. In the end I get different data sets stored in DB's though these sets are overlapping. (Ex. DB1 -> [1, 2, 3], DB2 -> [1, 3], DB3 -> [2,3,4])
How can get consistent data when reading from these DBs? What techniques should I apply on the client that writes data and a client that reads to be able to merge data sets successfully (getting on reader [1,2,3,4])?
What you're asking is basically an entire branch of computer science. It is very much a non-trivial problem and you will find that a surprising number of things are impossible.
Also note that simply saying "consistent" data is not a sufficient definition. There are all sorts of levels of consistency (read-your-own-writes, reads-follow-writes, monotonic read, linearizable, causal, etc.) I think you likely mean (in a very loose sense): consistency similar to what you get when you use just one database.
To answer your question directly, you want to decide on a read quorum size and a write quorum size. These sizes must be selected such that reads and writes will overlap by at least one database instance. If you want to optimize for write latency, use a smaller write quorum and do the opposite if you want to optimize for read latency.
A more detailed exposition of overlapping read/write quorums can be found in Weighted Voting for Replicated Data. This is considered a seminal work in the field of replication.
Also be careful around the behavior of your overlapping quorums when adding or removing a database instance. It sounds like you have a relatively static topology, but if that is not the case, then an entirely different set of choices need to be made.
Lastly - and here's the real kick in the teeth - what I have described doesn't actually give you consistency (by any definition) in some cases (I like Daniel Abadi's explanation of when andy why), but for many systems it gives you good enough consistency. It's up to you to decide exactly what level of consistency you need.
There are two-way/three-way replication software that do not require a "master".
You can also use transaction log based replications.
What and how you can use will depend on the database product you use.
HTH

Is it possible to default all MongoDB writes to safe? What is the performance hit from doing this?

For MongoDB 2.2.2, is it possible to default all writes to safe, or do you have to include the right flags for each write operation individually?
If you use safe writes, what is the performance hit?
We're using MongoMapper on Rails.
If you are using the latest version of 10gen official drivers, then the default actually is safe, not fire-and-forget (which used to be the default).
You can read this blog post by 10gen co-founder and CTO which explains some history and announces that all MongoDB drivers as of late November use "safe" mode by default rather than "fire-and-forget".
MongoMapper is built on top of 10gen supported Ruby Driver, they also updated their code to be consistent with the new defaults. You can see the check-in and comments here for the master branch. Since I'm not certain what their release schedule is, I would recommend you ask on MongoMapper mailing list.
Even prior to this change you could set "safe" value on connection level in MongoMapper which is as good as global. Starting with 0.11, you can do it in mongo.yml file. You can see how in the release notes.
The bottom line is that you don't have to specify safe mode for each write, but you can still specify higher or lower than default durability for each individual write operation if you wish, but when you switch to the latest versions of everything, then you will be using "safe writes" globally by default.
I do not use mongomapper so I can only answer a little.
In terms of the database, depends. A safe write is basically (basically being the keyword) waiting for the database to do what it would normally do after you got a default "I am done" response from a fire and forget.
There is more work depending on how safe you want the write to be. A good example is a write to a single node and one to many nodes. If you write to that single node you will get a quicker response from the database than if you wish to replicate the command (safely) to other nodes in the network.
Any amount of safe write does, of course, cause a performance hit in the amount of writes you can send to the server since there is more work required before a response is given which means less writes able to be thrown at the database. The key thing is getting the balance just right for your application, between speed and durability of your data.
Many drivers now (including MongoDB PHP 1.3, using a write concern of 1: http://php.net/manual/en/mongo.writeconcerns.php ) are starting to use normal safe writes by default and normal fire and forget queries are starting to abolished by default.
By looking at the mongomapper documentation: http://mongomapper.com/documentation/plugins/safe.html it seems you must still add the flags everywhere.

NoSQL databases: what about read consistency?

From what I can make out NoSQL databases might be a good option for high intensity data read applications, but are a less good fit if you need to do also do a lot data updates and transactionality is very important to you (what with there being no ACID compliance). Right? Too simplistic maybe.
But anyway, supposing I'm partly right at least I'm now concerned about how NoSQL databases maintain a "read consistent" view of the data that you're either reading or writing. Or do they? And if they don't, isn't that a really big problem?
I mean, if the data that you're reading (or updating) is changing as you read it then you're potentially going to get an inconsistent/dirty result set. Coming from an Oracle rdbms background, where all this is just handled for you, I find it confusing how the lack of read consistency is anything but a big problem. Could well be though that I'm missing some key point about all this. Can someone set me straight?
I am a developer on the Oracle NoSQL Database and will answer your question relative to that particular NoSQL system.
The Oracle NoSQL Database API allows the programmer to specify -- with each API call -- the level of read consistency. The four possible values, ranging from strictest to loosest, are Absolute, Time, Version, and None. Absolute says to always read from the replication master so that the most current value is returned. "Time" says that the system can return a value from any replica that is at least within a certain time delta of the master (e.g. read the value from any replica that is within 2 seconds of the master). Every read and write call to the system returns a "version handle". This version handle may be passed into any read call when Consistency.Version is specified and it tells the system to read from any replica which is at least as up to date as that version. This is useful for Read Modify Write (aka CAS) scenarios. The last value, Consistency.None says that any replica can be used (i.e. there is no consistency guaranteed).
I hope this is helpful.
Charles Lamb
A NoSQL database can be read-consistent, although it's generally not a big problem if it's not strictly so, check out the CAP theorem. There's been quite a lot of research done in this area, I recommend reading Amazon's Dynamo paper for a quick view of some of the problems and solutions faced by distributed systems like NoSQL databases.
MongoDB allows the application to select the desired level of read consistency using "write concern". This concept allows your application to block until a certain condition is met for a given write.
By way of example, you can consider any write successful so long as the operation is communicated to a master server. Alternatively, you can block until a write has been propagated to a majority of nodes in your replica set. In this way, you can mix performance/consistency to taste.
It depends on the NoSQL database you are using as each implements a different strategy. You can read, for example, Riak's explanation of their "eventual consistency" model or Lars Hofhansel's writeup on ACID in HBase