PostgreSQL streaming replication for high load - postgresql

I am planning to migrate my production oracle cluster to postgresql cluster. Current systems support 2000TPS and in order to support that TPS, I would be very thankful if someone could clarify bellow.
1) What is the best replication strategy ( Streaming or DRBD based replication)
2) In streaming replication, can master process traffic without slave and when slave come up does it get what lost in down time ?

About TPS - it depends mainly on your HW and PostgreSQL configuration. I already wrote about it on Stackoverflow in this answer. You cannot expect magic on some notebook-like configuration. Here is my text "PostgreSQL and high data load application".
1) Streaming replication is the simplest and almost "painless" solution. So if you want to start quickly I highly recommend it.
2) Yes but you have to archive WAL logs. See below.
All this being said here are links I would recommend you to read:
how to set streaming replication
example of WAL log archiving script
But of course streaming replication has some caveats which you should know:
problem with increasing some parameters like max_connections
how to add new disk and new tablespace to master and replicas

There is no “best solution” in this case. Which solution you pick depends on your requirements.
Do you need a guarantee of no data lost?
How big a performance hit can you tolerate?
Do you need failover or just a backup?
Do you need PITR (Point In Time Recovery)?
By default, I think a failed slave will be ignored. Depending on your configuration, the slave might take a long time to recover after e.g. a boot.
I'd recommend you read https://www.postgresql.org/docs/10/static/different-replication-solutions.html

Related

citus: Can I add one more replica for a distributed table

I have a distributed table,but this table only has one replica,only one replica doesn't have
ha, I want and one more replica for the table,can i? how to do?
I have search online help docs,but didn't find any solution.
Replica's in Citus are not an HA solution. For HA you will need to setup any postgres tooling for every member in your cluster to stream WAL to another node. Citus specializes in distributed queries and separates that problem from HA by relying on proven technology available in the postgres ecosystem.
If you want to scale out reads adding a replica can help. However adding replica's have a significantly high impact on write throughput. Before adding replica's please thoroughly test that your database can handle your expected load. And yet again, if HA is your goal, don't add Citus replica's instead, apply postgres HA solutions to every worker and coordinator.
Increasing the replica count of an already distributed table is due to above reasoning not an operation Citus provides out of the box. Easiest would be to create a new table and use an INSERT INTO SELECT clause to reinsert the data into a table with appropriate shard_count and replica's according to your application needs.

What is the appropriate value of disable_load_balance_on_write parameter in pgpool.conf at postgresql async cluster?

I have face an issue of data duplication due to async cluster setup. Data is duplicating because of delay between primary and standby server. Synchronous replication good option for data accuracy but it will degrade performance and also other issue. I am not interested to performance degrade due to synchronous replication.
I found a parameter (disable_load_balance_on_write) at pgpool.conf file, which can solve this problem. There are 4 values
transaction
trans_transaction
always
dml_adaptive.
I have set "always" but here is an issue is that, it is reading all select queries from primary node whether it is read or write no matter where standby remain ideal.
My requirement is that if any data is update or insert or delete it
should be read from primary node until standby copy those latest
data but here all select query should read from standby node which is not
modified yet.
What would be appropriate configuration as per requirements?
Need expert suggestion on this.

Are there any performance effects on master when using postgres streaming replication with hot_standby_feedback on

We are using postgres 10, with a setup of master and a hot standby with streaming replication.
We use the standby to divide workload of read queries.
We can't find information regarding how hot_standby_feedback will effect the master beside bloating of storage due to delay in cleanup.
Will it have to perform significant work to decide if a query from the standby should delay cleanup?
If I understand it right, the no longer needed by any transaction tuples are not removed unless HOT update or vacuum happens. So master will not have to make any decision, unless one of two happens. thus the overall load should not be affected by hot_standby_feedback much, maybe vacuum will have to do additional ticks probably.
My assumptions are based purely on documentation and experience. I did not look into the source code...

Using HADR Standby as replication source

Trying to figure out if there's way to replicate a subset of tables (columns) from HADR Standby, with ROS enabled. Latency of O(10) sec can be tolerated. We are using luw V10.5FP8 right now and will upgrade to V11 at some point.
I understand the read-only limitation on HADR Standby, and that eliminates some options, eg. QRepl / infosphere CDC, which write monitoring/metadata back to source.
Furthermore, assuming we still limit ourselves to db2 heterogeneous env and the repl user have read access to all the tables/columns, is there a replication tool that doesn't depends on constant source db connection? Meaning such tool only scans transaction log files, and writes to its own external metadata/file, without bothering with source db at all? It would be even better if it can capture once, and replay in multiple targets.
Really appreciate your inputs.

postgres streaming replication - slave only index

We have successfully deployed Postgres 9.3 with streaming replication (WAL replication). We currently have 2 slaves, the second slave being a cascaded slave from the first slave. Both slaves are hot-standby's with active read-only connections in use.
Due to load, we'd like to create a 3rd slave, with slightly different hardware specifications, and a different application using it as a read-only database in more of a Data Warehouse use-case. As it's for a different application, we'd like to optimize it specifically for that application, and improve performance by utilizing some additional indexes. For size and performances purposes, we'd rather not have those indexes on the master, or the other 2 slaves.
So my main question is, can we create different indexes on slaves for streaming replication, and if not, is there another data warehouse technique that I'm missing out on?
So my main question is, can we create different indexes on slaves for streaming replication
No, you can't. Streaming physical replication works at a lower level than that, copying disk blocks around. It doesn't really pay attention to "this is an index update," "this is an insert into a table," etc. It does not have the information it'd need to maintain standby-only indexes.
and if not, is there another data warehouse technique that I'm missing out on?
Logical replication solutions like:
Londiste
pglogical
Slony-I
can do what you want. They send row changes, so the secondary server can have additional indexes.