MongoDB replica set in Azure, where do I point the firewall? - mongodb

I have a mongoDB replica set in azure
I have:
server1 Primary
server2 secondary
server3 Arbiter
I have a dev environment on my local machine that I want to point to this mongoDB instance
What do I open on my Azure Firewall to make sure this configuration is setup with best practices.
Do I create a load balanced endpoint to the Primary and Secondary or do I create a single endpoint to the arbiter, or perhaps even something else?
thanks!

MongoDB will not play well with a load-balanced endpoint (as you might end up sending traffic to a secondary, and you'd have no control over this unless you implemented a custom probe for each VM, and then you'd need to update the probe's status based on the replicaset node's health, for each node). The MongoDB client-side driver is designed to work with a replicaset's topology to make the correct decision on which node to communicate with. Each replicaset node should have a discrete addressable ip:port. If you have all your instances in a single cloud service (e.g. myservice.cloudapp.net) then you'll need one port per instance (since they'd all share a single ip address). If each instance is in a different cloud service, then you can have the same port for each, with different dns name / ip address for each.

The best solution with an iptables is to open the third with an ip rule. It's open in the twice configuration and secure. This solution is the best architecture for your code.

Related

How to connect a web server to a Kubernetes statefulset and headless service

I have been learning Kubernetes for a few weeks and now I am trying to figure out the right way to connect a web server to a statefulset correctly.
Let's say I deployed a master-slave Postgres statefulset and now I will connect my web server to it. By using a cluster IP service, the requests will be load balanced across the master and the slaves for both reading (SELECT) and writing (UPDATE, INSERT, DELETE) records, right? But I can't do that because writing requests should be handled by the master. However, when I point my web server to the master using the headless service that will give us a DNS entry for each pod, I won't get any load balancing to the other slave replications and all of the requests will be handled by one instance and that is the master. So how am I supposed to connect them the right way? By obtaining both load balancing to all replications along with the slave in reading records and forwarding writing records requests to the master?
Should I use two endpoints in the web server and configure them in writing and reading records?
Or maybe I am using headless services and statefulsets the wrong way since I am new to Kubernetes?
Well, your thinking is correct - the master should be read-write and replicas should be read only. How to configure it properly? There are different possible approaches.
First approach is what you thinking about, to setup two headless services - one for accessing primary instances, the second one to access to the replica instances - good example is Kubegres:
In this example, Kubegres created 2 Kubernetes Headless services (of default type ClusterIP) using the name defined in YAML (e.g. "mypostgres"):
a Kubernetes service "mypostgres" allowing to access to the Primary PostgreSql instances
a Kubernetes service "mypostgres-replica" allowing to access to the Replica PostgreSql instances
Then you will have two endpoints:
Consequently, a client app running inside a Kubernetes cluster, would use the hostname "mypostgres" to connect to the Primary PostgreSql for read and write requests, and optionally it can also use the hostname "mypostgres-replica" to connect to any of the available Replica PostgreSql for read requests.
Check this starting guide for more details.
It's worth noting that there are many database solutions which are using this approach - another example is MySQL. Here is a good article in Kubernetes documentation about setting MySQL using Stateful set.
Another approach is to use some middleware component which will act as a gatekeeper to the cluster, for example Pg-Pool:
Pg pool is a middleware component that sits in front of the Postgres servers and acts as a gatekeeper to the cluster.
It mainly serves two purposes: Load balancing & Limiting the requests.
Load Balancing: Pg pool takes connection requests and queries. It analyzes the query to decide where the query should be sent.
Read-only queries can be handled by read-replicas. Write operations can only be handled by the primary server. In this way, it loads balances the cluster.
Limits the requests: Like any other system, Postgres has a limit on no. of concurrent connections it can handle gracefully.
Pg-pool limits the no. of connections it takes up and queues up the remaining. Thus, gracefully handling the overload.
Then you will have one endpoint for all operations - the Pg-Pool service. Check this article for more details, including the whole setup process.

mongodb cluster with ELB endpoint as dns

This is not a technical but more of architectural question I am asking here.
I have followed this blog for setting up the mongodb cluster. We have 2 private subnets in which I have configured 3 member replica set of mongodb. Now I want use a single dns like mongod.some_subdomain.example.com for whole cluster.
I do not have access to Route53 and setting/updating the dns records takes at least 2 hours in my case since I am dependant on our cloud support for it. I am not sure which server primarily responds to applications requests in mongodb cluster.
So is there a way to put the whole cluster behind ELB and use ELB as DNS to route traffic to primary and at the same time if there is failover then next primary would be the member of ELB except the arbiter node.
The driver will attempt to connect to all nodes in the replica set configuration. If you put nodes behind proxies the driver will bypass the proxies and try to talk to the nodes directly.
You can proxy standalone and sharded cluster deployments as the driver doesn't need a direct connection to data nodes in those but mapping multiple mongoses to a single address can create problems with retryable reads/writes, sessions, transactions etc. This is not a supported configuration.

load balancing postgres instances via aws network balancer possible?

We have an application that has multiple postgres databases (that are guaranteed to be in sync) installed on AWS EC2 instances in different availability zones. I would like to abstract them behind a single DNS so that, if one of the EC2 instances crashes, the clients can still access the db. I am wondering if I can use an AWS network load balancer to load balance my databases? Why or why not? If not, is there any other standard, easy-to-implement solution that I can use? (I am aware of http://www.pgpool.net/mediawiki/index.php/Main_Page for example. However, am leery of using something that I have to setup myself, especially since I would have to replicate pgpool instances as well...)
Having just tried it myself, it does seem you can set up a network load balancer to load balance your databases. My production setup uses patroni to manage failover, and patroni provides an HTTP API for health checks. The master returns a 200 response, while the replicas return a 503. This works fine for my use case, where the replicas are there just for failover, not for replicated reads. I'm assuming you could come up with some code that returns a successful response for health checks based on your needs.
I configured the load balancer to listen to port 5432 and the health checks to connect on port 8008. I modified the security group for the postgres instances to allow connections from my VPC's IP range, since the NLB doesn't have security groups. Connecting via psql to the NLB's DNS name worked as expected.
Though it works, I think I'll stick with my current setup, which has a PgBouncer running on each application instance (so no need to worry managing a pool of bouncer instances) with consul-template updating pgbouncer.ini and reloading PgBouncer when the leader key changes in consul.

Joining an external Node to an existing Kubernetes Cluster

I have a custom Kubernetes Cluster (deployed using kubeadm) running on Virtual Machines from an IAAS Provider. The Kubernetes Nodes have no Internet facing IP Adresses (except for the Master Node, which I also use for Ingress).
I'm now trying to join a Machine to this Cluster that is not hosted by my main IAAS provider. I want to do this because I need specialized computing resources for my application that are not offered by the IAAS.
What is the best way to do this?
Here's what I've tried already:
Run the Cluster on Internet facing IP Adresses
I have no trouble joining the Node when I tell kube-apiserver on the Master Node to listen on 0.0.0.0 and use public IP Adresses for every Node. However, this approach is non-ideal from a security perspective and also leads to higher cost because public IP Adresses have to be leased for Nodes that normally don't need them.
Create a Tunnel to the Master Node using sshuttle
I've had moderate success by creating a tunnel from the external Machine to the Kubernetes Master Node using sshuttle, which is configured on my external Machine to route 10.0.0.0/8 through the tunnel. This works in principle, but it seems way too hacky and is also a bit unstable (sometimes the external machine can't get a route to the other nodes, I have yet to investigate this problem further).
Here are some ideas that could work, but I haven't tried yet because I don't favor these approaches:
Use a proper VPN
I could try to use a proper VPN tunnel to connect the Machine. I don't favor this solution because it would add a (admittedly quite small) overhead to the Cluster.
Use a cluster federation
It looks like kubefed was made specifically for this purpose. However, I think this is overkill in my case: I'm only trying to join a single external Machine to the Cluster. Using Kubefed would add a ton of overhead (Federation Control Plane on my Main Cluster + Single Host Kubernetes Deployment on the external machine).
I couldn't think about any better solution than a VPN here. Especially since you have only one isolated node, it should be relatively easy to make the handshake happen between this node and your master.
Routing the traffic from "internal" nodes to this isolated node is also trivial. Because all nodes already use the master as their default gateway, modifying the route table on the master is enough to forward the traffic from internal nodes to the isolated node through the tunnel.
You have to be careful with the configuration of your container network though. Depending on the solution you use to deploy it, you may have to assign a different subnet to the Docker bridge on the other side of the VPN.

How do I configure mongodb replicaset using elastic IP's in EC2?

tldr: What will I need to do in order to use an elastic IP in my MongoDB replicaset configuration?
We have a three-node MongoDB replicaset running on EC2. One of the instances in the set was retired by AWS yesterday, and so we were forced to stop and restart the EC2 instance.
Unfortunately, when we first configured the replicaset, we were fairly new to AWS and not aware that the public DNS address of the instances was subject to change. We used the public DNS of each instance in the replicaset configuration, and in all of the application connection strings in our code. After reading up on the subject yesterday, I tried to get the node back online by assigning an elastic IP to the instance and changing the replicaset configuration to use that IP. After some pain, I was able to get the other two nodes back up and running with that configuration, but the instance with the elastic IP refused to re-join the replicaset, and the error in mongod.log says:
[rsStart] replSet info self not present in the repl set configuration
After yet more reading, I found that I should not have used the actual elastic IP in the config, but rather the public DNS name of the elastic IP. My question is, before I take everything offline again to try this change, what exactly will I need to do in order to use the elastic IP in the replicaset configuration? I found some information on this 10Gen page: http://docs.mongodb.org/ecosystem/platforms/amazon-ec2/#communication-across-regions that made me think I might need to mess with the hostname of the instance and/or the hosts file, but I haven't been able to find anybody describing my exact scenario.
Any thoughts?
It turned out to be a pretty simple fix; once I changed the replicaset configuration to use the public DNS of the elastic IP, the mongo node came back online. I didn't have to touch the hostname or the hosts file.
You should never use an Elastic IP for internal traffic like replication. You will be charged $0.01/GB for this traffic, whereas using the internal IP would be free.
If you're using something like replica sets, you really should be running in a VPC. Unlike normal EC2 instances, instances in an VPC keep the same private IP addresses and Elastic IP addresses even when stopped.