I'm trying to figure out the logical replication protocol. My attention was drawn to the message "standby status update" (byte('r')). Unfortunately, the documentation does not seem to describe the expected server behavior for this command.
If I send an lsn from the past, will the server resend transactions that were in the future regarding this lsn? Or, as far as I could find out in a practical way, this message will only affect the meta data of the replication slot (from pg_replication_slots table)?
That information is used to move the replication slot ahead and to provide the data that can be seen in pg_stat_replication (sent_lsn, write_lsn, flush_lsn and replay_lsn). With synchronous replication, it also provides the data that the primary has to wait for before it can report a transaction as committed.
Sending old, incorrect log sequence numbers will not change the data that the primary streams to the standby, but it might make the primary retail old WAL, because the replication slot is not moved ahead. It will also confuse monitoring. With synchronous replication, it will lead to hanging DML transactions.
Related
I'm getting the following error when running a query on a PostgreSQL db in standby mode. The query that causes the error works fine for 1 month but when you query for more than 1 month an error results.
ERROR: canceling statement due to conflict with recovery
Detail: User query might have needed to see row versions that must be removed
Any suggestions on how to resolve? Thanks
No need to touch hot_standby_feedback. As others have mentioned, setting it to on can bloat master. Imagine opening transaction on a slave and not closing it.
Instead, set max_standby_archive_delay and max_standby_streaming_delay to some sane value:
# /etc/postgresql/10/main/postgresql.conf on a slave
max_standby_archive_delay = 900s
max_standby_streaming_delay = 900s
This way queries on slaves with a duration less than 900 seconds won't be cancelled. If your workload requires longer queries, just set these options to a higher value.
Running queries on hot-standby server is somewhat tricky — it can fail, because during querying some needed rows might be updated or deleted on primary. As a primary does not know that a query is started on secondary it thinks it can clean up (vacuum) old versions of its rows. Then secondary has to replay this cleanup, and has to forcibly cancel all queries which can use these rows.
Longer queries will be canceled more often.
You can work around this by starting a repeatable read transaction on primary which does a dummy query and then sits idle while a real query is run on secondary. Its presence will prevent vacuuming of old row versions on primary.
More on this subject and other workarounds are explained in Hot Standby — Handling Query Conflicts section in documentation.
There's no need to start idle transactions on the master. In postgresql-9.1 the
most direct way to solve this problem is by setting
hot_standby_feedback = on
This will make the master aware of long-running queries. From the docs:
The first option is to set the parameter hot_standby_feedback, which prevents
VACUUM from removing recently-dead rows and so cleanup conflicts do not occur.
Why isn't this the default? This parameter was added after the initial
implementation and it's the only way that a standby can affect a master.
As stated here about hot_standby_feedback = on :
Well, the disadvantage of it is that the standby can bloat the master,
which might be surprising to some people, too
And here:
With what setting of max_standby_streaming_delay? I would rather
default that to -1 than default hot_standby_feedback on. That way what
you do on the standby only affects the standby
So I added
max_standby_streaming_delay = -1
And no more pg_dump error for us, nor master bloat :)
For AWS RDS instance, check http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.html
The table data on the hot standby slave server is modified while a long running query is running. A solution (PostgreSQL 9.1+) to make sure the table data is not modified is to suspend the replication and resume after the query:
select pg_xlog_replay_pause(); -- suspend
select * from foo; -- your query
select pg_xlog_replay_resume(); --resume
I'm going to add some updated info and references to #max-malysh's excellent answer above.
In short, if you do something on the master, it needs to be replicated on the slave. Postgres uses WAL records for this, which are sent after every logged action on the master to the slave. The slave then executes the action and the two are again in sync. In one of several scenarios, you can be in conflict on the slave with what's coming in from the master in a WAL action. In most of them, there's a transaction happening on the slave which conflicts with what the WAL action wants to change. In that case, you have two options:
Delay the application of the WAL action for a bit, allowing the slave to finish its conflicting transaction, then apply the action.
Cancel the conflicting query on the slave.
We're concerned with #1, and two values:
max_standby_archive_delay - this is the delay used after a long disconnection between the master and slave, when the data is being read from a WAL archive, which is not current data.
max_standby_streaming_delay - delay used for cancelling queries when WAL entries are received via streaming replication.
Generally, if your server is meant for high availability replication, you want to keep these numbers short. The default setting of 30000 (milliseconds if no units given) is sufficient for this. If, however, you want to set up something like an archive, reporting- or read-replica that might have very long-running queries, then you'll want to set this to something higher to avoid cancelled queries. The recommended 900s setting above seems like a good starting point. I disagree with the official docs on setting an infinite value -1 as being a good idea--that could mask some buggy code and cause lots of issues.
The one caveat about long-running queries and setting these values higher is that other queries running on the slave in parallel with the long-running one which is causing the WAL action to be delayed will see old data until the long query has completed. Developers will need to understand this and serialize queries which shouldn't run simultaneously.
For the full explanation of how max_standby_archive_delay and max_standby_streaming_delay work and why, go here.
It might be too late for the answer but we face the same kind of issue on the production.
Earlier we have only one RDS and as the number of users increases on the app side, we decided to add Read Replica for it. Read replica works properly on the staging but once we moved to the production we start getting the same error.
So we solve this by enabling hot_standby_feedback property in the Postgres properties.
We referred the following link
https://aws.amazon.com/blogs/database/best-practices-for-amazon-rds-postgresql-replication/
I hope it will help.
Likewise, here's a 2nd caveat to #Artif3x elaboration of #max-malysh's excellent answer, both above.
With any delayed application of transactions from the master the follower(s) will have an older, stale view of the data. Therefore while providing time for the query on the follower to finish by setting max_standby_archive_delay and max_standby_streaming_delay makes sense, keep both of these caveats in mind:
the value of the follower as a standby / backup diminishes
any other queries running on the follower may return stale data.
If the value of the follower for backup ends up being too much in conflict with hosting queries, one solution would be multiple followers, each optimized for one or the other.
Also, note that several queries in a row can cause the application of wal entries to keep being delayed. So when choosing the new values, it’s not just the time for a single query, but a moving window that starts whenever a conflicting query starts, and ends when the wal entry is finally applied.
I have face an issue of data duplication due to async cluster setup. Data is duplicating because of delay between primary and standby server. Synchronous replication good option for data accuracy but it will degrade performance and also other issue. I am not interested to performance degrade due to synchronous replication.
I found a parameter (disable_load_balance_on_write) at pgpool.conf file, which can solve this problem. There are 4 values
transaction
trans_transaction
always
dml_adaptive.
I have set "always" but here is an issue is that, it is reading all select queries from primary node whether it is read or write no matter where standby remain ideal.
My requirement is that if any data is update or insert or delete it
should be read from primary node until standby copy those latest
data but here all select query should read from standby node which is not
modified yet.
What would be appropriate configuration as per requirements?
Need expert suggestion on this.
After reading the documentation of PgPool I was left confused which option would suit my use case best. I need a main database instance which would serve the queries and 1 or more replicas (standbys) of the main one which would be used for disaster recovery scenarios.
What is very improtant for me is that all transactions committed successfully to the master node are guaranteed to be replicated eventually to the replicas such that when a failover occurs, the replica database instance has all transactions up to and including the latest one applied to it.
In terms of asynchronous replication, I have not seen any mention whether that is the case in the PgPool documentation, however, it does indeed mention some potential data loss occurring which is a bit too vague for me to draw any conclusions.
To combat this data loss, the documentation suggests to use synchronous streaming replication which before committing a transaction in the main node, ensures that all replicas have applied that change also. Thus, this method is slower than the asynchronous one but if there is no data loss, it could be viable.
Is synchronous replication the only method that allows me to achieve my use-case or would the asynchronous replication also do the trick? Also, what constitutes the potential data loss in the asynchronous replication?
Asynchronous replication means that the primary server does not wait for a standby server before reporting a successful COMMIT to the client. As a consequence, if the primary server fails, it is possible that the client believes that a certain transaction is committed, but none of the standby servers has yet received the WAL information. In a high availability setup, where you promote a standby in case of loss of the primary server, that means that you could potentially lose committed transactions, although it typically takes only a split second for the information to reach the standby.
With synchronous replication, the primary waits until the first available synchronous standby server has reported that it has received and persisted the WAL information before reporting a successful COMMIT to the client (the details of this, like which standby server is chosen, how many of them have to report back and when exactly WAL counts as received by the standby are configurable). So no transaction that has been reported committed to the client can get lost, even if the primary server is gone for good.
While it is technically simple to configure synchronous replication, it poses an architectural and design problem, so that asynchronous replication is often the better choice:
Synchronous replication slows down all data modification drastically. To work reasonably well, the network latency between primary and standby has to be very low. You usually cannot reasonably use synchronous replication between different data centers.
Synchronous replication reduces the availability of the whole system, because failure of the standby server prevents any data modification from succeeding. For that reason, you need to have at least two synchronous standby servers, one that is kept synchronized and one as a stand-in.
Even with synchronous replication, it is not guaranteed that reading from the standby after writing to the primary will give you the new data, because by default the primary doesn't wait for WAL to be replayed on the standby. If you want that, you need to set synchronous_commit = remote_apply on the primary server. But since queries on the standby can conflict with WAL replay, you will either have to deal with replication (and commit) delay or canceled queries on the standby. So using synchronous replication as a tool for horizontal scaling is reasonably possible only if you can deal with data modifications not being immediately visible on the standby.
If we compare multiple types of replication (Single-leader, Multi-leader or Leaderless), Single-leader replication has the possibility to be Linearizable. In my understanding, Linearizability means that once a write completes, all later reads should return that value, or of a later write. Or said in other words, there should be an impression if there is only one database, but no more. So i guess, no stale reads.
PostgreSQL in his streaming replication, has the ability to make all it's replicas synchronous using the synchronous_standby_names and it also has the ability to fine tune with the synchronous_commit option, where it can be set to remote_apply, so the leader waits until the transaction is replayed on the standby (making it visible to queries). In the documentation, in the paragraph where it talks about the remote_apply option, it states that this allows load-balancing in simple cases with causal consistency.
Few pages back, it says this:
,,Some solutions are synchronous, meaning that a data-modifying transaction is not considered committed until all servers have committed the transaction. This guarantees that a failover will not lose any data and that all load-balanced servers will return consistent results no matter which server is queried,,
So i'm struggling to understand what can there be guaranteed, and what anomalies can happen if we load-balance read queries to the read replicas. Can still there be stale reads? Can it happen when i query different replicas to get different results even no write after happend on the leader? My impression is yes, but i'm not really sure.
If no, how PostgreSQL prevents stale reads? I did not find anything with more details how it fully works under the hood. Does it use two-phase commit, or some modification of it, or it uses some other algorithm to prevent stale reads?
If it does not provide option of no stale reads, is there a way to accomplish that? I saw, PgPool has to option to load-balance to the replicas that are behind no more than a defined threshold, but i did not understand if it could be defined to load-balance to replicas that are up with the leader.
It's really confusing to me to really understand if there anomalies can happen in a fully synchronous replication in PostgreSQL.
I understand that setup like this has problems with availability, but that is now not a concern.
If you use synchronous replication with synchronous_commit = remote_apply, you can be certain that you will see the modified data on the standby as soon as the modifying transaction has committed.
Synchronous replication does not use two-phase commit, the primary server first commits locally and then simply waits for the feedback of the synchronous standby server before COMMIT returns. So the following is possible:
An observer will see the modified data on the primary before COMMIT returns, and before the data are propagated to the standby.
An observer will see the modified data on the standby before the COMMIT on the primary returns.
If the committing transaction is interrupted on the primary at the proper moment before COMMIT returns, the transaction will already be committed only on the primary. There is always a certain time window between the time the commit happens on the server and the time it is reported to the client, but that window increases considerably with streaming replication.
Eventually, though, the data modification will always make it to the standby.
My web app uses ADO.NET against SQL Server 2008. Database writes happen against a primary (publisher) database, but reads are load balanced across the primary and a secondary (subscriber) database. We use SQL Server's built-in transactional replication to keep the secondary up-to-date. Most of the time, the couple of seconds of latency is not a problem.
However, I do have a case where I'd like to block until the transaction is committed at the secondary site. Blocking for a few seconds is OK, but returning a stale page to the user is not. Is there any way in ADO.NET or TSQL to specify that I want to wait for the replication to complete? Or can I, from the publisher, check the replication status of the transaction without manually connecting to the secondary server.
[edit]
99.9% of the time, The data in the subscriber is "fresh enough". But there is one operation that invalidates it. I can't read from the publisher every time on the off chance that it's become invalid. If I can't solve this problem under transactional replication, can you suggest an alternate architecture?
There's no such solution for SQL Server, but here's how I've worked around it in other environments.
Use three separate connection strings in your application, and choose the right one based on the needs of your query:
Realtime - Points directly at the one master server. All writes go to this connection string, and only the most mission-critical reads go here.
Near-Realtime - Points at a load balanced pool of subscribers. No writes go here, only reads. Used for the vast majority of OLTP reads.
Delayed Reporting - In your environment right now, it's going to point to the same load-balanced pool of subscribers, but down the road you can use a technology like log shipping to have a pool of servers 8-24 hours behind. These scale out really well, but the data's far behind. It's great for reporting, search, long-term history, and other non-realtime needs.
If you design your app to use those 3 connection strings from the start, scaling is a lot easier, especially in the case you're experiencing.
You are describing a synchronous mirroring situation. Replication cannot, by definition, support your requirement. Replication must wait for a transaction to commit before reading it from the log and delivering it to the distributor and from there to the subscriber, which means replication by definition has a window of opportunity for data to be out of sync.
If you have a requirement an operation to read the authorithative copy of the data, then you should make that decission in the client and ensure you read from the publisher in that case.
While you can, in threory, validate wether a certain transaction was distributed to the subscriber or not, you should not base your design on it. Transactional replication makes no latency guarantee, by design, so you cannot rely on a 'perfect day' operation mode.