Go sql package, PostgreSQL and PgBouncer with behavior on retry - postgresql

Let's imaging that we have PostgreSQL and PgBouncer (with Transaction mode).
Also we are planning to execute following transaction:
BEGIN;
UPDATE a ...;
UPDATE b ...;
SELECT c ...;
UPDATE d ...;
COMMIT;
When transaction begins, PgBouncer gives us connection.
Then we execute:
UPDATE a; -- successful
UPDATE b; -- successful
SELECT a; -- successful
UPDATE d; -- failed, because PgBouncer restarted.
Then we try to retry using go DB client
UPDATE d;
On the 3rd time we acquire connect and execute query. Will this query executed in the same transaction or it will be executed on the new connection and leads to inconsistent state?
Or every statement executes with some identifier which can say that it is related to some transaction?

I can't be 100% certain since I am not familiar with the internals of PgBouncer or Postgres but it stands to reason that the transaction is bound to a connection since transactions have no identification. So as long as the TCP/SQL connection is not restarted than you should be able to resume. But if any of the applications restart then the transaction is gone.

Related

How to run transactional SQL on Redshift using boto3

I'm trying to use boto3 redshift-data client to execute transactional SQL for external table (Redshift spectrum) with following statement,
ALTER TABLE schema.table ADD IF NOT EXISTS
PARTITION(key=value)
LOCATION 's3://bucket/prefix';
After submit using execute_statement, I received error "ALTER EXTERNAL TABLE cannot run inside a transaction block".
I tried use VACUUM and COMMIT commands before the statement, but it will just mention that VACUUM or COMMIT cannot run inside a transaction block.
How may I successfully execute such statement?
This has to do with the settings of your bench. You have an open transaction at the start of every statement you run. Just add “END;” before the statement that needs to run outside of a transaction and things should work. Just make sure you launch both commands at the same time from your bench.
Like this:
END; VACUUM;
It seems not quite easy to run transactional SQL through boto3. However, I found a workaround using the redshift_connector library.
import redshift_connector
connection = redshift_connector.connect(
host=host, port=port, database=database, user=user, password=password
)
connection.autocommit = True
connection.cursor.execute(transactional_sql)
connection.autocommit = False
Reference - https://docs.aws.amazon.com/redshift/latest/mgmt/python-connect-examples.html#python-connect-enable-autocommit

Is my PostgreSQL query still running even if server closed the connection?

Postgres noob here. I have a very long postgresql query running an update on about ~3 million rows. I did this via psql and after about the second hour I got this message:
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
Is my query still running? I did run:
select *
from pg_stat_activity
where datname = 'mydb';
and I do still see a row with my update query, with the state = active, wait_event_type = IO, and wait_event = DataFileRead. Do I need to be worried that my connection closed out? Is my query still running, and is the best way to check for done-ness to keep checking up with
select *
from pg_stat_activity
where datname = 'mydb';
?
Your query will not succeed. Your client lost its connection, and while the backend server process that was handling your UPDATE is still going, it will notice that the client disconnected when it tries to return the query status upon completion, and abort the transaction (whether or not you had performed a BEGIN; every statement in PG is implicitly in a transaction even without BEGIN/COMMIT). You will need to re-issue the UPDATE.

PostgreSQL - how to unlock table record AFTER shutdown

Sorry I looked for this everywhere but cannot find a working solution :/
I badly needed this for abnormal testing.
What I'm trying to do here is:
Insert row in TABLE A
Lock this record
(At separate terminal) service postgresql-9.6 stop
Wait a few moments
(At separate terminal) service postgresql-9.6 start
"try" to unlock the record by executing "COMMIT;" in the same terminal as #2.
How i did #2 is like this:
BEGIN;
SELECT * FROM TABLE A WHERE X=Y FOR UPDATE;
Problem is that once I did #6, this error shows up:
DB=# commit;
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
So when I execute "COMMIT;" again, it only shows:
DB=# commit;
WARNING: there is no transaction in progress
COMMIT
Now the record cannot be unlocked.
I've tried getting the PID of that locking thing, and then execute pg_terminate (or cancel), but it just doesn't work.
DB=# select pg_class.relname,pg_locks.* from pg_class,pg_locks where pg_class.relfilenode=pg_locks.relation;
DB=# select pg_terminate_backend(2450);
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
DB=# select pg_cancel_backend(3417);
ERROR: canceling statement due to user request
Please help. Does anyone have any ideas? :/
..Or is this even possible?
My specs:
Postgresql-9.6
RedHat Linux
There's a fundamental misunderstanding or three here. Lock state is not persistent.
When you lock a record (or table), the lock is associated with the transaction that took the lock. The transaction is part of the running PostgreSQL session, your connection to the server.
Locks are released at the end of transactions.
Transactions end:
On explicit COMMIT or ROLLBACK;
When a session disconnects without an explicit COMMIT of the open transaction, triggering an implied ROLLBACK;
When the server shuts down, terminating all active sessions, again triggering an implied ROLLBACK of all in-progress transactions.
Thus, you have released the lock you took at step 2 when you shut the server down at step 3. The transaction that acquired that lock no longer exists because its session was terminated by server shutdown.
If examine pg_locks you'll see how the locked row is present before restart and vanishes after restart.

Can I execute a stored procedure 'detached' (not keeping DB connection open) on PostgreSQL?

I want to execute a long-running stored procedure on PostgreSQL 9.3. Our database server is (for the sake of this question) guaranteed to be running stable, but the machine calling the stored procedure can be shut down at any second (Heroku dynos get cycled every 24h).
Is there a way to run the stored procedure 'detached' on PostgreSQL? I do not care about its output. Can I run it asynchronously and then let the database server keep working on it while I close my database connection?
We're using Python and the psycopg2 driver, but I don't care so much about the implementation itself. If the possibility exists, I can figure out how to call it.
I found notes on the asynchronous support and the aiopg library and I'm wondering if there's something in those I could possibly use.
No, you can't run a function that keeps on running after the connection you started it from terminates. When the PostgreSQL server notices that the connection has dropped, it will terminate the function and roll back the open transaction.
With PostgreSQL 9.3 or 9.4 it'd be possible to write a simple background worker to run procedures for you via a queue table, but this requires the ability to compile and install new C extensions into the server - something you can't do on Heroku.
Try to reorganize your function into smaller units of work that can be completed individually. Huge, long-running functions are problematic for other reasons, and should be avoided even if unstable connections aren't a problem.

How to close idle connections in PostgreSQL automatically?

Some clients connect to our postgresql database but leave the connections opened.
Is it possible to tell Postgresql to close those connection after a certain amount of inactivity ?
TL;DR
IF you're using a Postgresql version >= 9.2
THEN use the solution I came up with
IF you don't want to write any code
THEN use arqnid's solution
IF you don't want to write any code
AND you're using a Postgresql version >= 14
THEN use Laurenz Albe's solution
For those who are interested, here is the solution I came up with, inspired from Craig Ringer's comment:
(...) use a cron job to look at when the connection was last active (see pg_stat_activity) and use pg_terminate_backend to kill old ones.(...)
The chosen solution comes down like this:
First, we upgrade to Postgresql 9.2.
Then, we schedule a thread to run every second.
When the thread runs, it looks for any old inactive connections.
A connection is considered inactive if its state is either idle, idle in transaction, idle in transaction (aborted) or disabled.
A connection is considered old if its state stayed the same during more than 5 minutes.
There are additional threads that do the same as above. However, those threads connect to the database with different user.
We leave at least one connection open for any application connected to our database. (rank() function)
This is the SQL query run by the thread:
WITH inactive_connections AS (
SELECT
pid,
rank() over (partition by client_addr order by backend_start ASC) as rank
FROM
pg_stat_activity
WHERE
-- Exclude the thread owned connection (ie no auto-kill)
pid <> pg_backend_pid( )
AND
-- Exclude known applications connections
application_name !~ '(?:psql)|(?:pgAdmin.+)'
AND
-- Include connections to the same database the thread is connected to
datname = current_database()
AND
-- Include connections using the same thread username connection
usename = current_user
AND
-- Include inactive connections only
state in ('idle', 'idle in transaction', 'idle in transaction (aborted)', 'disabled')
AND
-- Include old connections (found with the state_change field)
current_timestamp - state_change > interval '5 minutes'
)
SELECT
pg_terminate_backend(pid)
FROM
inactive_connections
WHERE
rank > 1 -- Leave one connection for each application connected to the database
If you are using PostgreSQL >= 9.6 there is an even easier solution. Let's suppose you want to delete all idle connections every 5 minutes, just run the following:
alter system set idle_in_transaction_session_timeout='5min';
In case you don't have access as superuser (example on Azure cloud), try:
SET SESSION idle_in_transaction_session_timeout = '5min';
But this latter will work only for the current session, that most likely is not what you want.
To disable the feature,
alter system set idle_in_transaction_session_timeout=0;
or
SET SESSION idle_in_transaction_session_timeout = 0;
(by the way, 0 is the default value).
If you use alter system, you must reload configuration to start the change and the change is persistent, you won't have to re-run the query anymore if, for example, you will restart the server.
To check the feature status:
show idle_in_transaction_session_timeout;
Connect through a proxy like PgBouncer which will close connections after server_idle_timeout seconds.
From PostgreSQL v14 on, you can set the idle_session_timeout parameter to automatically disconnect client sessions that are idle.
If you use AWS with PostgreSQL >= 9.6, you have to do the following:
Create custom parameter group
go to RDS > Parameter groups > Create parameter group
Select the version of PSQL that you use, name it 'customParameters' or whatever and add description 'handle idle connections'.
Change the idle_in_transaction_session_timeout value
Fortunately it will create a copy of the default AWS group so you only have to tweak the things that you deem not suitable for your use-case.
Now click on the newly created parameter group and search 'idle'.
The default value for 'idle_in_transaction_session_timeout' is set to 24 hours (86400000 milliseconds). Divide this number by 24 to have hours (3600000) and then you have to again divide 3600000 by 4, 6 or 12 depending on whether you want the timeout to be respectively 15, 10 or 5 minutes (or equivalently multiply the number of minutes x 60000, so value 300 000 for 5 minutes).
Assign the group
Last, but not least, change the group:
go to RDS, select your DB and click on 'Modify'.
Now under 'Database options' you will find 'DB parameter group', change it to the newly created group.
You can then decide if you want to apply the modifications immediately (beware of downtime).
I have the problem of denied connections as there are too much clients connected on Postgresql 12 server (but not on similar projects using earlier 9.6 and 10 versions) and Ubuntu 18.
I wonder if those settings
tcp_keepalives_idle
tcp_keepalives_interval
could be more relevant than
idle_in_transaction_session_timeout
idle_in_transaction_session_timeout indeed closes only the idle connections from failed transactions, not the inactive connections whose statements terminate correctly...
the documentation reads that these socket-level settings have no impact with Unix-domain sockets but it could work on Ubuntu.
Up to PostgreSQL 13, you can use my extension pg_timeout.