I'm fine tuning a Postgres database and I am about to set the maximun number of with prepared transactions with max_prepared_transactions.
The application uses a lot of prepared statements but not prepared transactions in the sense PREPARE name AS xyz.
My question is:
Is there a difference between prepared statements and prepared
transactions?
Does max_prepared_transactions affect prepared statements?
Yes. PREPARE TRANSACTION is used to initiate a two-phase transaction, which is generally used if you want to commit atomically to two databases at the same time.
Prepared statements relate to requesting the server to plan an SQL statement ahead of time, usually so that you can execute the statement multiple times without the overhead of planning it each time. See PREPARE.
The two are unrelated.
No, max_prepared_transactions does not affect prepared statements.
Related
I have a simple bug in my program that uses multi user support. I'm using knex to build sql queries, and I have a pseudocode that depicts the scenerio:
const value = queryBuilder().readDataFromTheDatabase();//executes this
//do some other work and get value
queryBuilder.writeValueToTheDatabase(updateValue(value));
This piece of code is being use in sort of a middleware function. And as you can see, this is a possible race condition i.e. when multiple users access the thing, one of them gets a stale value when they try to execute this at roughly the same amount of time.
My solution
So, I was think a possible solution would be create a single queryBuilder statement:
queryBuilder().readAndUpdateValueInTheDatabase();
So, I'll probably have to use a little bit of plpgsql. I was wondering if this solution will be sufficient. Will the statement be executed atomically? i.e. When one request reads and doesn't finish his write, does another request wait around to both read and write or just waits to write but, reads the stale value?
I think what you are looking for here is isolation, not atomicity. You could set all transactions to the highest isolation level, serializable (which is higher than the usual default level). With that level, if data that a transaction read (and presumably relied upon) is changed, then when it tries to commit it might get a serialization failure error. I say "might", because the system could conclude the situation would be consistent with the data change having happened after the commit, in which case the commit is allowed to stand.
To avoid a race condition with such a setup, you must run both the read and the write in the same database transaction.
There are two ways to do that:
Use the default READ COMMITTED isolation level and lock the rows when you read them:
SELECT ... FROM ... FOR NO KEY UPDATE;
That locks the rows against concurrent modifications, and the lock is held until the end of the transaction.
Use the REPEATABLE READ isolation level and don't lock anything. Then your UPDATE will receive a serialization error (SQLSTATE 40001) if somebody modified the row concurrently. In that case, you roll the transaction back and try again in a new REPEATABLE READ transaction.
The first solution is usually better if you expect conflicts frequently, while the second is better if conflicts are rare.
Note that you should keep the database transaction as short as possible in both cases to keep the risk of conflicts low.
Transaction in PostgreSQL use an optimistic locking model when accessing to tables, while some other DBMS do pessimistic locking (IBM Db2) or the two locking model (MS SQL Server).
Optimistic locking snapshot the data on which you are working, and the modifications are done on the snapshot until the transaction ended. When the transaction finishes, the snapshot modifications are postponed on the real database (table rows), but if some other user had made a change between the moment of the snapshot capture and the commit, then the commit cannot apply and the COMMIT is rejected as a ROLLBACK.
You can try to raise the ISOLATION LEVEL (REPEATABLE READ or SERIALIZABLE) to avoid the trouble.
Asumming that a have a table called "t1" in a "db1" and other table called "t2" in a "db2", and i need to insert a record on both tables or fails.
Connected to the db1 i guess i shall type this
BEGIN;
PREPARE TRANSACTION 'pepe'; -- this says the manual that makes your transaction gets stored on disk, so what is the purpose if i can't use it from another database?)
insert into t1 (field) values ('a_value');
COMMIT PREPARED 'pepe'
Connected to the db2 i guess that
BEGIN;
PREPARE TRANSACTION 'pepe'; -- this fails (the name of the transacttion, what is the meaning, what is use for?)
-- It complains about this "ERROR: transaction identifier "pepe" is already in use"
insert into t2 (field) values ('another_value');
COMMIT PREPARED 'pepe'
As you may see i don't get how to use two phase commits on postgres.
TL;DR
I'm not getting how to perform syncronization commands on differents DB within the same RDBMS.
I have read at oficial postgres documentation that for syncronizing works across two or more unrelated postgres databases an implementation of the so called "two-phases commits" protocol is at our disposal.
So i start trying to see how people do actually use them within the postgres, i do not see any actual example, at most i get to this post of a guy that was trying to experiment with several postgres client connected to the differents databases in order to emulate the multiple process running in pararell doing things to the several dbs that should end in a gratefully (all commit) or dreadfully way (all rollback).
Other sources i have peek looking foward examples were:
https://en.wikipedia.org/wiki/Two-phase_commit_protocol (this source
explain well the protocol but really makes me wonder where or who is
my "Coordinator" and how to send messages to the "participants"... i
only got prepare transaction <id>, commit prepared <id> or
rollback prepared <id> commands at my disposal)
Two phase commit
https://dba.stackexchange.com/questions/145656/dependent-transaction-in-separate-database-connections
https://www.endpointdev.com/blog/2010/07/distributed-transactions-and-two-phase/
https://www.citusdata.com/blog/2017/11/22/how-citus-executes-distributed-transactions/
(From a golang client-app) https://github.com/go-pg/pg/issues/490
Please i'm really confused, i hope horse_with_no_name to appear here and enlightme (as happen in the past) or any other charity soul that can help me.
Thanks in advance!
Resolution (After Laurenz's Answer)
Connected to the db1, these are the sql lines to execute:
BEGIN;
-- DO THINGS TO BE DONE IN A ALL OR NOTHING FASHION
-- Stop point --
PREPARE TRANSACTION 't1';
COMMIT PREPARED 't1' || ROLLBACK PREPARED 't1' (decision requires awareness and coordination)
meanwhile connected to the db2 these will be the script to execute:
BEGIN;
-- DO THINGS TO BE DONE IN A ALL OR NOTHING FASHION
-- Stop point --
PREPARE TRANSACTION 't2';
COMMIT PREPARED 't2' || ROLLBACK PREPARED 't2'
The -- Stop point -- is where a coordinator process (for example
an application executing the statement, or a human behind a psql
client console or pgAdminII) shall stop the execution of both
scripts (actually not execute any further instruction, that is what i mean by stop).
Then, first on db1 (and then on db2, or viceversa) the
coordinator process (whatever been human or not) must run PREPARE TRANSACTION on each connection.
If one of then fails, then the coordinator must run ROLLBACK PREPARED on those database where the transaction was already prepared and ROLLBACK on the others.
If no one fails the coordinator must run COMMIT PREPARED on all involved databases, an operation that shall not fail ever (like existing the home when you are one step outside your house with all the things properly set to exit safely)
I think you misunderstood PREPARE TRANSACTION.
That statement ends work on the transaction, that is, it should be issued after all the work is done. The idea is that PREPARE TRANSACTION does everything that could potentially fail during a commit except for the commit itself. That is to guarantee that a subsequent COMMIT PREPARED cannot fail.
The idea is that processing is as follows:
Run START TRANSACTION on all database involved in the distributed transaction.
Do all the work. If there are errors, ROLLBACK all transactions.
Run PREPARE TRANSACTION on all databases. If that fails anywhere, run ROLLBACK PREPARED on those database where the transaction was already prepared and ROLLBACK on the others.
Once PREPARE TRANSACTION has succeeded everywhere, run COMMIT PREPARED on all involved databases.
That way, you can guarantee “all or nothing” across several databases.
One important component here that I haven't mentioned is the distributed transaction manager. It is a piece of software that persistently memorizes where in the above algorithm processing currently is so that it can clean up or continue committing after a crash.
Without a distributed transaction manager, two-phase commit is not worth a lot, and it is actually dangerous: if transactions get stuck in the “prepared” phase but are not committed yet, they will continue to hold locks and (in the case of PostgreSQL) block autovacuum work even through server restarts, as such transactions must needs be persistent.
This is difficult to get right.
The error is
ERROR: LOCK TABLE can only be used in transaction blocks
SQL state: 25P01
Why do I get that error, and what can I do to lock a table?
Locks in relational database systems are held until the end of the current transaction.
Now PostgreSQL uses autocommit mode, so if you don't start a transaction explicitly with BEGIN or START TRANSACTION, every statement will run in its own transaction.
An explicit table lock that is not part of an explicitly started multi-statement transaction is useless, because it would be gone at the end of the statement. But it could still harm concurrent transactions by blocking them, so it is forbidden.
Please reconsider before taking explicit table locks. In 99% of all cases they are unnecessary and a sign that the programmer didn't understand database concurrency techniques or didn't think hard enough. Explicit locks harm concurrency and can block autovacuum (sometimes with disastrous consequences).
I am trying to figure out if PostgreSQL query execution plans are stored somewhere (possibly as complimentary to pg_stat_statements and pg_prepared_statements) in a way they are available for longer than the duration of the session. I understand that PREPARE does cache a sql statement in pg_prepared_statements, though the plan itself does not seem to be available in any view as far as I can tell.
I am not sure if there is a doc explaining the life cycle of a query plan for PostgreSQL but from what it sounds in the EXPLAIN documentation, PostgreSQL does not cache query plans at all. Is this accurate?
Thanks!
PostgreSQL has no shared storage for execution plans, so they cannot be reused across database sessions.
There are two ways to cache an execution plan within a session:
Use prepared statements with the SQL statements PREPARE and EXECUTE. The plan will be cached for the life time of the prepared statement, usually until your session ends.
Use PL/pgSQL functions. The plans for all static SQL statements (that is, statements that are not run with EXECUTE) in such a function will be cached for the session life time.
Other than that, execution plans are not cached in PostgreSQL.
If it does, is there a way to query the database to confirm that the prepared statements are being generated, and see how much memory the prepared statements are consuming?
First, you can check whether prepared statements exist on your connection by querying the pg_prepared_statements system view. Any prepared statement should show up as a row there. Take care to query pg_prepared_statements on the same connection you're interested in checking. I'm not aware of any way of knowing how much memory the statements are consuming though.
If NHibernate doesn't prepare statements, you may want to check out Npgsql 3.2's new automatic preparation feature. When activated (it's off by default), Npgsql will track statements and automatically prepare frequently-used ones. See Npgsql's performance docs for more information.