What does the setting targetServerType with value preferSlave in the PostgreSQL JDBC driver really mean?
The reason why I am asking this question is according to the documentation:
targetServerType = String
Allows opening connections to only servers with the required state, the
allowed values are any, master, slave, secondary, preferSlave and
preferSecondary. The master/slave distinction is currently done by
observing if the server allows writes. The value preferSecondary tries
to connect to secondary if any are available, otherwise allows falls
back to connecting also to master.
Now I was trying this setup in Cloudfoundry and if I was to look at the metrics of PostgreSQL on the dashboard, I still see reads being done on the master. Hence my question. Isn't master nodes supposed to be used for reads in this case?
And how does it affect performance in terms of read/write. Especially in an application where the writes are done with the target as master and reads are done with the target preferslave?
Related
I am currently planning some server infrastructure. I have two servers in different locations. My apps (apis and stuff) are running on both of them. The client connects to the nearest (best connection). In case of failure of one server the other can process the requests. I want to use mongodb for my projects. The first idea is to use a replica set, therefore I can ensure the data is consistent. If one server fails the data is still accessible and the secondary switches to primary. When the app on the primary server wants to use the data, it is fine, but the other server must connect to to the primary server in order to handle data (that would solve the failover, but not the "best connection" problem). In Mongodb there is an option to read data from secondary servers, but then I have to ensure, that the inserts (only possible on primary) are consistent on every secondary. There is also an option for this "writeConcern". Is it possible to somehow specify “writeConcern on specific secondary”? Because If an add a second secondary without the apps on it, "writeConcern" on every secondary would not be necessary. And if I specify a specific value I don't really know on which secondary the data is available, right ?
Summary: I want to reduce the connections between the servers when the api is called.
Please share some thought or Ideas to fix my problem.
Writes can only be done on primaries.
To control which secondary the reads are directed to, you can use max staleness as well as tags.
that the inserts (only possible on primary) are consistent on every secondary.
I don't understand what you mean by this phrase.
If you have two geographically separated datacenters, A and B, it is physically impossible to write data in A and instantly see it in B. You must either wait for the write to propagate or wait for the read to fetch data from the remote node.
To pay the cost at write time, set your write concern to the number of nodes in the deployment (2, in your proposal). To pay the cost at read time, use primary reads.
Note that merely setting write concern equal to the number of nodes doesn't make all nodes have the same data at all times - it just makes your application only consider the write successful when all nodes have received it. The primary can still be ahead of a particular secondary in terms of operations committed.
And, as noted in comments, a two-node replica set will not accept writes unless both members are operational, which is why it is generally not a useful configuration to employ.
Summary: I want to reduce the connections between the servers when the api is called.
This has nothing to do with the rest of the question, and if you really mean this it's a premature optimization.
If what you want is faster network I/O I suggest looking into setting up better connectivity between your application and your database (for example, I imagine AWS would offer pretty good connectivity between their various regions).
I am just starting to use MongoDB while testing it with YCSB and I have a couple of questions about read preferences and its implementation.
I have setup 1 Primary and 2 Secondary nodes, and set reading preferences on YCSB java client like this mongo.setReadPreference(ReadPreference.secondary());
1. Why if I point YCSB to connect to primary node it still can perform read operations without generating error message? Also I checked the logs and I can see that Primary is the node that served these requests.
2 How do clients know about Secondary nodes in a production environment? Where do you connect clients by default? Do all the clients go to Primary, retrieve list of Secondaries and then reconnect to secondaries to perform reads ?
3 By browsing source code I have found that logic of selecting appropriate replica based on preferences is done in replica_set_monitor.cpp Although it is not yet clear to me where this code is executed, is it on Primary, Secondary or client?
Thank you
When your application connects only to the primary, it doesn't learn about any secondaries. ReadPreference.secondary() is just a preference, not a mandate. When the application doesn't know that a secondary exists, it will read from the primary.
To make your application aware of the secondaries, you need to use the class DBClientReplicaSet instead of DBClientConnection which takes an std::vector of hosts as a constructor argument. This array should include all members of the set.
When you would prefer to have the application unaware of the replica-set members, you could set up a sharded cluster (which might consist of only a single shard) and connect to the router. The mongos process will then handle the replica-set abstraction.
When an application connects to any active replica member it will issue a internal type of rs.status() which is infact a isMaster command (http://docs.mongodb.org/meta-driver/latest/legacy/connect-driver-to-replica-set/) and caches the response of that for a specific time until deemed fit to refresh that information, in fact in the c++ driver even tells you the class that will hold the cache: http://api.mongodb.org/cxx/current/classmongo_1_1_replica_set_monitor.html
Holds state about a replica set and provides a means to refresh the local view.
There are number of ways that the application can connect to a set to understand, the most common way is by providing a seed list into the connection string within your application code to the driver, that way it can connect to any member and ask: "What is there here?"
Obviously, I know why to use a replica set in general.
But, I'm confused about the difference between connecting directly to the PRIMARY mongo instance and connecting to the replica set. Specifically, if I am connecting to Mongo from my node.js app using Mongoose, is there a compelling reason to use connectSet() instead of connect()? I would assume that the failover benefits would still be present with connect(), but perhaps this is where I am wrong...
The reason I ask is that, in mongoose, the connectSet() method seems to be less documented and well-used. Yet, I cannot imagine a scenario where you would NOT want to connect to the set, since it is recommended to always run Mongo on a 3x+ replica set...
If you connect only to the primary then you get failover (that is, if the primary fails, there will be a brief pause until a new master is elected). Replication within the replica set also makes backups easier. A downside is that all writes and reads go to the single primary (a MongoDB replica set only has one primary at a time), so it can be a bottleneck.
Allowing connections to slaves, on the other hand, allows you to scale for reads (not for writes - those still have to go the primary). Your throughput is no longer limited by the spec of the machine running the primary node but can be spread around the slaves. However, you now have a new problem of stale reads; that is, there is a chance that you will read stale data from a slave.
Now think hard about how your application behaves. Is it read-heavy? How much does it need to scale? Can it cope with stale data in some circumstances?
Incidentally, the point of a minimum 3 members in the replica set is to offer resiliency and safe replication, not to provide multiple nodes to connect to. If you have 3 nodes and you lose one, you still have enough nodes to elect a new primary and have replication to a backup node.
I have decided to start developing a little web application in my spare time so I can learn about MongoDB. I was planning to get an Amazon AWS micro instance and start the development and the alpha stage there. However, I stumbled across a question here on Stack Overflow that concerned me:
But for durability, you need to use at least 2 mongodb server
instances as master/slave. Otherwise you can lose the last minute of
your data.
Is that true? Can't I just have my box with everything installed on it (Apache, PHP, MongoDB) and rely on the data being correctly stored? At least, there must be a config option in MongoDB to make it behave reliably even if installed on a single box - isn't there?
The information you have on master/slave setups is outdated. Running single-server MongoDB with journaling is a durable data store, so for use cases where you don't need replica sets or if you're in development stage, then journaling will work well.
However if you're in production, we recommend using replica sets. For the bare minimum set up, you would ideally run three (or more) instances of mongod, a 'primary' which receives reads and writes, a 'secondary' to which the writes from the primary are replicated, and an arbiter, a single instance of mongod that allows a vote to take place should the primary become unavailable. This 'automatic failover' means that, should your primary be unable to receive writes from your application at a given time, the secondary will become the primary and take over receiving data from your app.
You can read more about journaling here and replication here, and you should definitely familiarize yourself with the documentation in general in order to get a better sense of what MongoDB is all about.
Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.
In some cases, you can use replication to increase read capacity. Clients have the ability to send read and write operations to different servers. You can also maintain copies in different data centers to increase the locality and availability of data for distributed applications.
Replication in MongoDB
A replica set is a group of mongod instances that host the same data set. One mongod, the primary, receives all write operations. All other instances, secondaries, apply operations from the primary so that they have the same data set.
The primary accepts all write operations from clients. Replica set can have only one primary. Because only one member can accept write operations, replica sets provide strict consistency. To support replication, the primary logs all changes to its data sets in its oplog. See primary for more information.
More and more of the noSQL databases that are in the spotlight uses the master/slave pattern to provide "availability", but what it does (at least from my perspective) is creating the weak link in a chain that will break anytime. - Master goes down, slaves stops to function.
It's a great way to handle big amounts of data and to even out reads/writes, but seen in an availability perspective? Not so much...
I understand from some noSQL's that the slaves can be changed to be a master easily, but doing this would be a headache to handle in most applications. Right?
So how do you people take care of this sort of stuff? How does master/slave-databases work in the real world?
This is a fairly generalized question; can you specify what data stores you're specifically talking about?
I've been working with MongoDB, and it handles this very gracefully; every member in a "replica set" (basically, a master-slave cluster) is eligible to become the master. On connecting to a replica set, the set will tell the connecting client about every member in the set. If the master in the set goes offline, the slaves will automatically elect a new master, and the client (since it has a list of all nodes in the set) will try new nodes until it connects; the node it connects to will inform the client about the new master, and the client switches its connection over. This allows for fully transparent master/slave failover without any changes in your application.
This is obviously fine for single connections, but what about restarts? The MongoDB driver handles this as well; it can accept a list of nodes to try connecting to, and will try them in serial until it finds one to connect to. Once it connects, it will ask the node who its master is, and forward the connection there.
The net result is that if you have a replica set established, you can effectively just not worry that any single node exploding will take you offline.