We have a system with PostgreSQL 12.x where all changes are being written to master database server and two read-only streaming async replicas are used to reduce load from the master server for read-only transactions that can deal with slight delay.
Because the async replica may be delayed from the master in some cases we need a method to query the latency (delay) for the replication. We do not want to contact the master server to do this so one obvious way to do this is to query delay from the replica server:
select
(extract(epoch from now()) - extract(epoch from last_msg_send_time)) * 1000 as delay_ms
from pg_stat_wal_receiver;
However, it seems that pg_stat_wal_receiver has no data for our slave machines. It does have one row but only the pid column has data and every other column is empty. The documentation is unclear about the details but may it be that pg_stat_wal_receiver has data only for sync streaming replica?
Is there a way to figure out streaming delay of async replica? I'm hoping this is just some kind of configuration error instead of "this is not supported".
All the server machines are running PostgreSQL 12.2 but the client machines are still running PostgreSQL 9.5 client library in case it makes a difference.
I think I can answer the question about the missing columns of pg_stat_wal_receiver. To read the rest of columns, you need to login as superuser or a login role being granted the pg_read_all_stats privilege/role.
This behavior is documented in the source code of walreceiver.c, in the implementation of pg_stat_get_wal_receiver, says:
...
/*
* Only superusers and members of pg_read_all_stats can see details.
* Other users only get the pid value to know whether it is a WAL
* receiver, but no details.
*/
...
I don't understand why the table pg_stat_wal_receiver does not have data, but here's a workaround for the missing latency data:
select now() - pg_last_xact_replay_timestamp() as replication_lag;
or if you want the lag as milliseconds (plain number):
select round(extract(epoch from (now() - pg_last_xact_replay_timestamp())*1000)) as replication_lag_ms;
Note that this uses function pg_last_xact_replay_timestamp() (emphasis mine):
Get time stamp of last transaction replayed during recovery. This is
the time at which the commit or abort WAL record for that transaction
was generated on the primary. If no transactions have been replayed
during recovery, this function returns NULL. Otherwise, if recovery is
still in progress this will increase monotonically. If recovery has
completed then this value will remain static at the value of the last
transaction applied during that recovery. When the server has been
started normally without recovery the function returns NULL.
However, it seems that async streaming replication does increment this timestamp continuously when system has normal load (active writing on master). It's still unclear if this timestamp stops increasing if master has no changes but the streaming replication is active.
Related
I am using aurora serverless v10.x
webhook data is streamed into AWS SQS which triggers a lambda function to process.
NOTE: Lambda is set to "concurrency=1" to ensure it runs just one invocation at any time.
When the incoming data is just one record every few seconds, the balance update is fine.
But when the input is a batch of 5 records(just 5), I am seeing some weird "UPDATES to the Balance" field.
Anyone face such issue? Any tips?
The critical SQL is a single SQL as shown below. it takes 20 milliseconds per execution. yet the next transactions, is not able to see the balance and update correctly.
with new_balance as (update accounts set account_balance=account_balance-1.0, reserved_balance=reserved_balance-1.0 where iban='49' returning account_balance as new_acct_balance)
INSERT INTO ledger_transactions ( iban, end_to_end_txn_id, amount, balance) select '49', 'b946733af9',1.0 ,new_acct_balance from new_balance;
What have I tried so far:
I have tried all options of transaction isolation. "repeatable read, serialization, read uncommitted, and read committed" at the db level and at the transaction level.
tried to delay the SQS queue to provide the payload one every 5 seconds. Wierdly, I see the insert timestamp of the rows in the ledger_transactions table happening rapidly. The 5 seconds delay I thought I would see if the SQS delivers the data 1 row every 5 seconds is not happening.
3.wrote the above SQL as many invididual SQLs
If I create a "temp_account_balance" table and force an insert of the data coming in, and compute and populate account_balance, that is working fine. But this is an unnecessary overkill.
I would like each subscriber server to monitor its health without accessing the publisher server
1.
I use the following code from the publisher to get the lag. Is it possible to compute the lag also from the subscriber server
SELECT
slot_name, active, confirmed_flush_lsn, pg_current_wal_lsn(),
(pg_current_wal_lsn() - confirmed_flush_lsn) AS bytes_lag
FROM pg_replication_slots;
If I use from the subscriber the following
select received_lsn, latest_end_lsn from pg_stat_subscription
I will still need the following from the publisher select pg_current_wal_lsn();
Is there a way to know the lag without accessing the publisher?
2. I have a duplicate value at one of the tables that caused the replication to stop, but
select srsubstate from pg_subscription_rel
is showing as 'r' for all tables.
How can I know which table is problematic
How can I know what is the reason that the replication stopped
3. How can a subscriber know that its logical slot or even publisher was dropped
No, you cannot get that information from the subscriber. The subscriber doesn't know what there is to receive that it has not yet received.
To figure out the cause when replication breaks, you have to look at the subscriber's log file. Yes, that is manual activity, but so is conflict resolution.
You will quickly figure out if the replication slot has been dropped, because there will be nasty error messages in the log. This is quite similar to dropped tables.
We have a Postgres 12 system running one master master and two async hot-standby replica servers and we use SERIALIZABLE transactions. All the database servers have very fast SSD storage for Postgres and 64 GB of RAM. Clients connect directly to master server if they cannot accept delayed data for a transaction. Read-only clients that accept data up to 5 seconds old use the replica servers for querying data. Read-only clients use REPEATABLE READ transactions.
I'm aware that because we use SERIALIZABLE transactions Postgres might give us false positive matches and force us to repeat transactions. This is fine and expected.
However, the problem I'm seeing is that randomly a single line INSERT or UPDATE query stalls for a very long time. As an example, one error case was as follows (speaking directly to master to allow modifying table data):
A simple single row insert
insert into restservices (id, parent_id, ...) values ('...', '...', ...);
stalled for 74.62 seconds before finally emitting error
ERROR 40001 could not serialize access due to concurrent update
with error context
SQL statement "SELECT 1 FROM ONLY "public"."restservices" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
We log all queries exceeding 40 ms so I know this kind of stall is rare. Like maybe a couple of queries a day. We average around 200-400 transactions per second during normal load with 5-40 queries per transaction.
After finally getting the above error, the client code automatically released two savepoints, rolled back the transaction and disconnected from database (this cleanup took 2 ms total). It then reconnected to database 2 ms later and replayed the whole transaction from the start and finished in 66 ms including the time to connect to the database. So I think this is not about performance of the client or the master server as a whole. The expected transaction time is between 5-90 ms depending on transaction.
Is there some PostgreSQL connection or master configuration setting that I can use to make PostgreSQL to return the error 40001 faster even if it caused more transactions to be rolled back? Does anybody know if setting
set local statement_timeout='250'
within the transaction has dangerous side-effects? According to the documentation https://www.postgresql.org/docs/12/runtime-config-client.html "Setting statement_timeout in postgresql.conf is not recommended because it would affect all sessions" but I could set the timeout only for transactions by this client that's able to automatically retry the transaction very fast.
Is there anything else to try?
It looks like someone had the parent row to the one you were trying to insert locked. PostgreSQL doesn't know what to do about that until the lock is released, so it blocks. If you failed rather than blocking, and upon failure retried the exact same thing, the same parent row would (most likely) still be locked and so would just fail again, and you would busy-wait. Busy-waiting is not good, so blocking rather than failing is generally a good thing here. It blocks and then unblocks only to fail, but once it does fail a retry should succeed.
An obvious exception to blocking-better-than-failing being if when you retry, you can pick a different parent row to retry with, if that make sense in your context. In this case, maybe the best thing to do is explicitly lock the parent row with NOWAIT before attempting the insert. That way you can perhaps deal with failures in a more nuanced way.
If you must retry with the same parent_id, then I think the only real solution is to figure out who is holding the parent row lock for so long, and fix that. I don't think that setting statement_timeout would be hazardous, but it also wouldn't solve your problem, as you would probably just keep retrying until the lock on the offending row is released. (Setting it on the other session, the one holding the lock, might be helpful, depending on what that session is doing while the lock is held.)
I am looking for a way to terminate user sessions that have been inactive or open for an arbitrary amount of time in Redshift. I noticed that in STV_SESSIONS I have a large number of sessions open, often for the same user, sometimes having been initialized days earlier. While I understand that this might be a symptom of a larger issue with the way some things close out of Redshift, I was hoping for a configurable timeout solution.
In the AWS documentation I found PG_TERMINATE_BACKEND (http://docs.aws.amazon.com/redshift/latest/dg/PG_TERMINATE_BACKEND.html), but I was hoping for a more automatic solution.
the timeout is only for timing out queries and not for a session.
Timeout (ms)
The maximum time, in milliseconds, queries can run before being canceled. If a read-only query, such as a SELECT statement, is canceled due to a WLM timeout, WLM attempts to route the query to the next matching queue based on the WLM Queue Assignment Rules. If the query doesn't match any other queue definition, the query is canceled; it is not assigned to the default queue. For more information, see WLM Query Queue Hopping. WLM timeout doesn’t apply to a query that has reached the returning state. To view the state of a query, see the STV_WLM_QUERY_STATE system table.
JSON property: max_execution_time
You can use Workload Management Configuration in AWS redshift. Where you can set the user group, query group, and timeout sessions. You can group all the same users together and assign a group name to them and set the timeout session for them. This is how I do it. Set the Query queue, based on your priority and then set concurrency level for the user group and the timeout in ms.
For more information, you can refer to AWS documentation.
Source - Workload Management
- Configuring Workload Management
Its pretty easy and straight forward.
If I’ve made a bad assumption please comment and I’ll refocus my answer.
You can use the newly introduced idle session timeout feature in Redshift. It is available both when creating a user and post creation (using Alter statement). Lookup the SESSION TIMEOUT parameter.
I am working out a master/slave architecture for my web application in which frontends reading from slaves must only do so if the slave is consistent up to the time of the last known write triggered by the requesting client. Slaves can be inconsistent with respect to the master as long as they are inconsistent only regarding writes by other users, not by the requesting user.
All writes are sent to the master which is easy enough, but the trick is knowing when to send reads to the master versus a slave.
What I would like to do is:
On a write request, at the end of the request processing phase after all writes are committed, take some kind of reading of the database's current transaction pointer and stash it in a cookie on the client response.
On a read request, take the value from this cookie and first check if the slave is caught up to this transaction pointer location. If it's caught up, delete the cookie and read from the slave happily. If not, read from the master and leave the cookie in place.
I am not sure what specific functions to use to achieve this on the master and slave or if they exist at all. I would like to avoid the overhead of a dedicated counter in a table that I have to explicitly update and query, since I presume PG is already doing this for me in some fashion. However, I could do that if necessary.
pg_current_xlog_location on the master and pg_last_xlog_replay_location on the slave look promising, however, I do not know enough to know if these will reliably do the job:
Will an idle master and a caught-up slave always report the exact same values for these functions?
The syntax of their return value is confusing to me, for instance 0/6466270 - how do I convert this string into an integer in a way that I can reliably do a simple greater- or less-than comparison?
Note: I am planning to use streaming replication with slaves in hot standby mode, if that affects the available solutions. I am currently using 9.1, but would entertain an upgrade if that helped.
take some kind of reading of the database's current transaction pointer and stash it in a cookie on the client response.
You can use:
SELECT pg_xlog_location_diff(pg_current_xlog_location(), '0/00000000');
to get an absolute position, but in this case you actually only need to store pg_current_xlog_location(), because:
On a read request, take the value from this cookie and first check if the slave is caught up to this transaction pointer location.
Compare the saved pg_current_xlog_location() with the slave's pg_last_xlog_replay_location() using pg_xlog_location_diff.
Will an idle master and a caught-up slave always report the exact same values for these functions?
If you're using streaming replication, yes. If you're doing archive based replication, no.
You shouldn't rely on the same value anyway. You just need to know if the slave is new enough.
The syntax of their return value is confusing to me, for instance 0/6466270 - how do I convert this string into an integer in a way that I can reliably do a simple greater- or less-than comparison?
Use pg_xlog_location_diff. It might not be in 9.1, so you may need to upgrade.