Settings for Pgpool-2 and 2 read replicas, goal is to split connections evenly between two replicas - pgpool

My setup:
pgpool-2 V4.0.2
OS Ubuntu
2 aws rds read replicas (master db not included in the setup)
pgpool mode: master_slave mode + sub-mode streaming replication
Purpose of using pgpool (have not achieved)
Evenly split incoming db connections between two replicas, e.g. when there are 20 db connections come to pgpool, pgpool will open 10 connections to replica 1 and open 10 connections to replica 2.
Things that my current setup can do
Load balancing queries, cache connections, watch-dog fail over.

I got reply from official Pgpool-2 developer. Pgpool-2 does not split connections, it handles load balancing for query only, not for connections.

Related

How to scale the total number of connection with pgpool load balancing?

I have 3 postgresql database (one master and two slave) with a pgpool, each database can handle 200 connections, and I want to be able to get 600 active connection on the pgpool.
My problem is that if I set pgpool with 600 child process, it can open the 600 connection on only one database (the master for example if all connection make a write query), but with 200 child process I only use +- 70 connection on each database.
So is there a way to configure pgpool to have a load balancing that scale with the number of database ?
Thanks.
Having 600 connections available in each db should not be an ideal solution. I would really look into my application before setting such a high connections value.
Load balancing scalability of pgpool can be increased by setting equal backend_weight parameter. So that no of sql queries will equally get distributed among postgresql nodes.
Also pgpool manages database connection pool using num_init_children and max_pool parameter.
The num_init_children parameter is used to span pgpool process that will connect to each PostgreSQL backends.
Also num_init_children parameter value is the allowed number of concurrent clients to connect with pgpool.
pgpool roughly tries to make max_pool*num_init_children no of connections to each postgresql backend.

Pgpool executes queries on standby nodes instead of master when replication is behind in standby

I have a postgresql 10 master db with 2 hot standby servers with streaming replication, and the replication is working correctly. The synchronous_commit is setted to remote_write
Also I have a pgpool 3.7.5 configured with the params:
delay_threshold = 1
sr_check_period = 1
And the the following weights:
master: 1
node1: 3
node2: 3
In the log I can see the node1 and node2 are lagging:
Replication of node:1 is behind 75016 bytes from the primary server (node:0)
The pgpool docs says:
delay_threshold (integer)
Specifies the maximum tolerance level of replication delay in WAL bytes on the standby server against the primary server. If the delay exceeds this configured level, Pgpool-II stops sending the SELECT queries to the standby server and starts routing everything to the primary server even if load_balance_mode is enabled, until the standby catches-up with the primary. Setting this parameter to 0 disables the delay checking. This delay threshold check is performed every sr_check_period. Default is 0.
The problem it's that pgpool sends queries to the hot standbys before they obtained the new data from master through streaming replication.
I enabled the log_per_node_statement = on temporally to be able to see which node the query executes and I can see that queries are sent to the nodes even if there aren't sync when delay_threshold should avoid that.
Am I missing something? When the nodes are behind master the queries are not supposed to go the master?
Thanks in advance.
Other config values of pgpool are:
num_init_children = 120
max_pool = 3
connection_cache = off
load_balance_mode = on
master_slave_sub_mode = 'stream'
replication_mode = off
sr_check_period = 1
first, I think you should check the result of "show pool_nodes" and check if three nodes are properly set with right role (primary, standby, standby).
second, did you set "app_name_redirect_preference_list" or "database_redirect_preference_list" ? If so, That can affect on selecting the node for SELECT query.
And in my opinion, I think delay_threshold = 1 is strict, the unit is bytes and in my case, I use "10000000" on PROD. why don't you just put "/NO LOAD BALANCE/" comment to send specific queries to only master?
And I simply recommend you to upgrade the version of pgpool to 4.0.0 (2018-10-19 released). 3.7.x has mysterious bug on load balancing.
I also faced a similar problem that load balancing is not working properly with the version (3.7.5) even when our configuration has no problem. The pgpool randomly We even contact pgpool developer team to solve this problem but they couldn't find the root cause.
You can check the details in the link below.
https://www.pgpool.net/mantisbt/view.php?id=435.
And this was resolved like charm by upgrading to version 4.0.0.

MongoDB nodes (AWS EC2 Instances) are still responsive even after network partitioning done using Security Groups

I have created a MongoDB replica set using 5 EC2 instances on AWS. I added the nodes using rs.add("[IP_Address]") command.
I want to perform network partition in the replica set. In order to that, I have specified 2 kinds of security groups. 'SG1' has 27017 port (MongoDB port) opened. 'SG2' doesn't expose 27017.
I want to isolate 2 nodes from the replica set. When I apply SG2 on these 2 nodes (EC2 instances), ideally they should stop getting write and read from the primary as I am blocking the 27017 port using security group SG2. But in my case, they are still writable. Data written on Primary reflects on the partitioned node. Can someone help? TYA.
Most firewalls, including AWS Security groups, will block incoming connections when the connection is being opened. Changing settings will affect all new connection, but existing open connections are not re-evaluated when they are applied.
MongoDB maintains connections between hosts and that would only get blocked after loss of connection between the hosts.
On Linux you can restart the networking which will reset the connections. You can do this after applying the new rules by running:
/etc/init.d/networking stop && /etc/init.d/networking start

MongoDB sharding: mongos and configuration servers together?

We want to create a MongoDB shard (v. 2.4). The official documentation recommends to have 3 config servers.
However, the policies of our company won't allow us to get 3 extra servers for this purpose. Since we have already 3 application servers (1 web node, 2 process nodes) we are considering to put the configuration servers in the same application servers, with the mongos. Availability is not critical for us.
What do you think about this configuration? Can we face some problem or is it discouraged for some reason?
Given that Availability is not critical for your use case, I would say it should be fine to place the config servers in the same application servers and mongos.
If one of the process nodes is down, you will lose: 1 x mongos, 1 application server and 1 config server. During this down time, the other two config servers will be read-only , which means there won't be balancing of shards, modification to cluster config etc. Although your other two mongos should still be operational (CRUD wise). If your web-node is down, then you have a bigger problem to deal with.
If two of the nodes are down (2 process nodes, or 1 web server and process node), again, you would have bigger problem to deal with. i.e. Your applications are probably not going to work anyway.
Having said that, please consider the capacity of these nodes to be able to handle a mongos, an application server and a config server. i.e. CPU, RAM, network connections, etc.
I would recommend to test the deployment architecture in a development/staging cluster first under your typical workload and use case.
Also see Sharded Cluster High Availability for more info.
Lastly, I would recommend to check out MongoDB v3.2 which is the current stable release. The config servers in v3.2 are modelled as a replica set, see Sharded Cluster config servers for more info.

How to shift internal communication of nodes in a MongoDB cluster to another network to decrease the load of main network

I have created a 8 node MongoDB cluster with 2 shards + 2 Replica(1 for each shard) + 3 Config Servers + 1 Mongos.
All these are on network 192.168.1.(eth0) with application server. So this network is handling all the traffic.
So I have created one another network 192.168.10.(eth1) which is having only these 8 MongoDB nodes.
Now all the eight nodes are the part of both the networks with dual IP's.
Now I want to shift the internal traffic between these mongodb nodes to network 192.168.10.(eth1) to reduce the load from main network 192.168.1.(eth0)
So how to bind the ports/nodes for the purpose?
You can use bind_ip as a startup or configuration option. Keep in mind that various nodes need to be accessible in the event of failover.
Notably here is your single mongos where it would be advised to either co-locate the service per app server, or depending on requirements, have a pool available to your driver connection. Preferably both and having a large instance for each 'mongos' where aggregate operations are used.
I got the solution of the problem I was looking for. I configured the cluster according to the IP's of network 192.168.11._
Now the internal data traffic is going through this network.