Failover with mongodb - mongodb

I have to setup a database that can handle failover (if one crashes the other takes over). For that, I decided to use mongodb:
I set up a replica set with two instances. Each instance is running on a separate VM. I have several questions:
It is recommended to use at least 3 instances in a replica set. Is it ok to use only two ?
I have two instances, and then two IP addresses. Which IP should I give to my application that will need to read/write in the database ? When a database is down, how the request will be redirected to the instance that is still up ?
Some help to get started would be great !

It is recommended to use at least 3 instances in a replica set. Is it ok to use only two ?
No, the minimum requirement for a replica set is three processes (docs), but the third could be an arbiter even though it is not recommended.
I have two instances, and then two IP addresses. Which IP should I give to my application that will need to read/write in the database ? When a database is down, how the request will be redirected to the instance that is still up ?
There are two alternatives:
#1 (recommended)
You provide the driver with all addresses (for more detailed information how, visit the docs), example with nodejs driver (it is similar with the other). This way, the driver will know all, or at least more then one of, the instances directly, which will prevent problems if all of the specified instances are down (see #2).
var MongoClient = require('mongodb').MongoClient;
MongoClient.connect('mongodb://[server1],[server2],[...]/[database]?replicaSet=[name]', function(err, db) {
});
#2
You provide the driver with one of them (probably the primary) and mongodb will figure out the rest of them. However, if your app starts up when the specified instance(s) is down, the driver would not be able to find the other instances, and therefore cannot connect to mongodb.

Related

Can A Postgres Replication Publication And Subscription Exist On The Same Server

I have a request asking for a read only schema replica for a role in postgresql. After reading documentation and better understanding replication in postgresql, I'm trying to identify whether or not I can create the publisher and subscriber within the same database.
Any thoughts on the best approach without having a second server would be greatly appreciated.
You asked two different question. Same database? No. Since Pub/Sub requires tables to have the same name (including schema) on both ends, you would be trying to replicate a table onto itself. Using logical replication plugins other than the built-in one might get around this restriction.
Same server? Yes. You can replicate between two databases of the same instance (but see the note in the docs about some extra hoops you need to jump through) or between two instances on the same host. So whichever of those things you meant by "same server", yes, you can.
But it seems like an odd way to do this. If the access is read only, why does it matter whether it is to a replica of the real data or to the real data itself?

Postgres replication betwenn 2 databases on same server

I need to create a replica of existing database, that would copy any changing operation from master to slave, I.e create a mirror some sort of. I found a lot of examples in web but they all describes process when master and slave are on different servers.
I would like to create a write replica on the same server where master is located , without spinning up second instance of Postgres.
Is it possible to do so and could you point me a direction where I could find a solution how to do it?
Thank you.
P.S. I understand that replication on 2 servers is better, but I just need to do it on one common server.
If you want physical replication, you will need to run two instances of PostgreSQL. If they are on the same server machine, they will need to have different port numbers. The different port numbers is the only complexity, otherwise it is just like running on two different servers.
If you want logical replication, you can do that within a single instance, but you will need to jump through some hoops to create the subscription intra-instance, as described in the "Notes" section
You could consider using a simple trigger to insert/update/delete data on the other database as soon as the main one get modified.
A more "professional" way would be to use synchronous replication.

Why isn't "connect" option in mongo connection string documented?

The issue was that even if i target just one node of my replica set in my connection string, mongo-go-driver always want to discover and connect other nodes.
I found a solution here that basically say i should add the connect option in the connection string.
mongodb://host:27017/authDb?connect=direct
My question is: How good or bad practice is this and why mongo doesn't have documented, are there other available values that this option can have?
That option only exists for the Go driver. For all other drivers it is unrecognized, so it is not documented as a general connection string option.
It is documented for the Go Driver at https://godoc.org/go.mongodb.org/mongo-driver/mongo#example-Connect--Direct
How good or bad practice is this and why mongo doesn't have documented, are there other available values that this option can have?
As pointed out in the accepted answer, that this is documented under the driver documentation. Now for the other part of the question.
Generally speaking in the replica set context, you would want to connect to the topology instead of directly to a specific replica set member, with an exception for administrative purposes. Replication is designed to provide redundancy, and connecting directly to one member i.e. Primary is not recommended in case of fail-over.
All of the official MongoDB drivers follows MongoDB Specifications. In regards to the direct connections, the requirement currently is server-discovery-and-monitoring.rst#general-requirements:
Direct connections: A client MUST be able to connect to a single
server of any type. This includes querying hidden replica set members,
and connecting to uninitialized members (see RSGhost) in order to run
"replSetInitiate". Setting a read preference MUST NOT be necessary to
connect to a secondary. Of course, the secondary will reject all
operations done with the PRIMARY read preference because the slaveOk
bit is not set, but the initial connection itself succeeds. Drivers
MAY allow direct connections to arbiters (for example, to run
administrative commands).
It only specifies that it MUST be able to do so, but not how. MongoDB Go driver is not the only driver that currently supporting the direct option approach, there are also .NET/C# and Ruby as well.
Currently there is an open PR for the specifications to unify the behaviour. In the future, all drivers will have the same way of establishing a direct connection.

How should I manage postgres database handles in a serverless environment?

I have an API running in AWS Lambda and AWS Gateway using Up. My API creates a database connection on startup, and therefore Lambda does this when the function is triggered for the first time. My API is written in node using Express and pg-promise to connect to and query the database.
The problem is that Lambda creates new instances of the function as it sees fit, and sometimes it appears as though there are multiple instances of it at one time.
I keep running out of DB connections as my Lambda function is using up too many database handles. If I log into Postgres and look at the pg_stat_activity table I can see lots of connections to the database.
What is the recommended pattern for solving this issue? Can one limit the number of simultaneous instances of a function in Lambda? Can you share a connection pool across instances of a function (I doubt it).
UPDATE
AWS now provides a product called RDS Proxy which is a managed connection pooling solution to solve this very issue: https://aws.amazon.com/blogs/compute/using-amazon-rds-proxy-with-aws-lambda/
There a couple ways that you can run out of database connections:
You have more concurrent Lambda executions than you have available database connections. This is certainly possible.
Your Lambda function is opening database connections but not closing them. This is a likely culprit, since web frameworks tend to keep database connections open across requests (which is more efficient), but on Lambda have no opportunity to close them since AWS will silently terminate the instance.
You can solve 1 by controlling the number of available connections on the database server (the max_connections setting on PostgreSQL) and the maximum number of concurrent Lambda function invocations (as documented here). Of course, that just trades one problem for another, since Lambda will return 429 errors when it hits the limit.
Addressing 2 is more tricky. The traditional and right way of dealing with database connection exhaustion is to use connection pooling. But with Lambda you can't do that on the client, and with RDS you don't have the option to do that on the server. You could set up an intermediary persistent connection pooler, but that makes for a more complicated setup.
In the absence of pooling, one option is to create and destroy a database connection on each function invocation. Unfortunately that will add quite a bit of overhead and latency to your requests.
Another option is to carefully control your client-side and server-side connection parameters. The idea is first to have the database close connections after a relatively short idle time (on PostgreSQL this is controlled by the tcp_keepalives_* settings). Then, to make sure that the client never tries to use a closed connection, you set a connection timeout on the client (how to do so will be framework dependent) that is shorter than that value.
My hope is that AWS will give us a solution for this at some point (such as server-side RDS connection pooling). You can see various proposed solutions in this AWS forum thread.
You have two options to fix this:
You can tweak Postgres to disconnect those idle connections. This is the best way but may require some trial-and-error.
You have to make sure that you connect to the database inside your handler and disconnect before your function returns or exits. In express, you'll have to connect/disconnect while inside your route handlers.

Replicating data between 2 Mongo replica sets

I'm currently have a mongo replicaset consisting of 1 primary and 2 slaves, that is used by a read-only application. I'm adding a 2nd read-only application that requires access to the same data. I have / am considering using the same RS for both applications, but was wondering if there's a way to create a specific type of configuration with Mongo, that works something like this:
1 x primary, that handles all writes, but is not seen as part of a replicaset by the application, and then 2 sets of read-only secondaries that replicate from primary. Each set of secondary replicates writes from the master. Conceptually, something like:
/----> RS1: |Secondary1|Secondary2|..|SecondaryN| <--- App1
PRIMARY|=>
\----> RS2: |Secondary1|Secondary2|..|SecondaryN| <--- App2
Is this sort of configuration possible at all? What similar architectures could I consider for this use-case?
Thanks in advance.
Brett
I came across a way to implement this using mongo tooling:
Create a replica set to use as a master. Data updates are written to this RS (represented by "PRIMARY" in the diagram). Do not enable authentication on this host
Create 2 replica sets with the same data (completely independent of each other)
Schedule regular "mongooplog" runs, using #1 as from and each of the RS's for host see the manual
Authentication can be set up in the RSs from #2, that only give applications read access to the data.
I haven't tested this yet, but from what I can tell, this approach should work for my objectives - is there anything I've overlooked in this approach?
edit
While trying to use this approach, I ran into issues when trying to use mongooplog with authentication on the destination. over and above that, mongooplog doesn't cater for authentication on the source / --from rs, and so I wrote a tool to cater for this:
https://github.com/brettcave/mongo-oplogreplay
It supports authentication on both source and destination, as well as replicaset targets.