I made a mistake with an update query on a very large database. I realised my mistake while the update/query was still running and clicked 'cancel query'.
I've checked the History panel (pgAdmin3) and it sais 'Execution Cancelled'. It does not say anything about any rows being affected.
Does this mean that no rows were affected by the update? Is there a way to check a log of some sort to see if any rows were affected?
Postgres runs a query in a transaction. The transaction completes as a whole or is cancelled as a whole. This property is called Atomicity (from ACID.)
If your query was cancelled, no rows where affected.
Related
I have a table with 500k elements. Now I want to add a new column
(type boolean, nullable = false) without a default value.
The query to do so is running like for ever.
I'm using PostgreSQL 12.1, compiled by Visual C++ build 1914, 64-bit on my Windows 2012 Server
In pgAdmin I can see the query is blocked by PID 0. But when I execute this query, I can't see the query with pid = 0
SELECT *
FROM pg_stat_activity
Can someone help me here? Why is the query blocked and how can I fix this to add a new column to my table.
UPDATE attempt:
SELECT *
FROM pg_prepared_xacts
Update
It works after rollback all prepared transaction.
ROLLBACK PREPARED 'gid goes here';
You have got stale prepared transactions. I say that as in "you have got the measles", because it is a disease for a database.
Such prepared transactions keep holding locks and block autovacuum progress, so they will bring your database to its knees if you don't take action. In addition, such transactions are persisted, so even a restart of the database won't get rid of them.
Remove them with
ROLLBACK PREPARED 'gid goes here'; /* use the transaction names shown in the view */
If you use prepared transactions, you need a distributed transaction manager. That is a piece of software that keeps track of all prepared transactions and their state and persists that information, so that no distributed transaction can become stale. Even if there is a crash, the distributed transaction manager will resolve in-doubt transactions in all involved databases.
If you don't have that, don't use prepared transactions. You now know why. Best is to set max_prepared_transactions to 0 in that case.
I'm pretty new to PostgreSQL and I'm sure I'm missing something here.
The scenario is with version 11, executing a big drop table and insert transaction on a given table with the nodejs driver, which may take 30 minutes.
While doing that, if I try to query with select on that table using the jdbc driver, the query execution waits for the transaction to finish. If I close the transaction (by finishing it or by forcing it to exit), the jdbc query becomes responsive.
I thought I can read a table with one connection while performing a transaction with another one.
What am I missing here?
Should I keep the table (without dropping it at the beginning of the transaction) ?
DROP TABLE takes an ACCESS EXCLUSIVE lock on the table, which is there precisely to prevent it from taking place concurrently with any other operation on the table. After all, DROP TABLE physically removes the table.
Since all locks are held until the end of the database transaction, all access to the dropped table is blocked until the transaction ends.
Of course the files are only removed when the transaction commits, so you might wonder why PostgreSQL doesn't let concurrent transactions read in the mean time. But that would mean that COMMIT may be blocked by a concurrent reader, or a SELECT might cause a system error in the middle of reading, both of which don't sound appealing.
I have a concurrent workflow which inserts a record with a unique index on column A and column B, and if this is successful, performs an async action that cannot be rolled back (API request), inside a single transaction.
Said API request should only ever happens once, but currently it's possibly that it gets triggered multiple times if that record is being inserted in parallel.
If i'm not mistaken, the way to solve this problem is to set a lock on the offending row to make sure that any parallel inerts will wait until the initial transaction is complete.
Which lock would be the correct one for this usecase?
No need for an explicit lock.
If a second transactions inserts the same values for the PK that an un-committed other transaction has already inserted, the second will wait until the first transactions commits or rolls back.
If the first transaction rolls back, the second will succeed. If the first transaction commits, the second will get an "unique key violation" error.
I'm using PostgreSQL 9.2 in a Windows environment.
I'm in a 2PC (2 phase commit) environment using MSDTC.
I have a client application, that starts a transaction at the SERIALIZABLE isolation level, inserts a new row of data in a table for a specific foreign key value (there is an index on the column), and vote for completion of the transaction (The transaction is PREPARED). The transaction will be COMMITED by the Transaction Coordinator.
Immediatly after that, outside of a transaction, the same client requests all the rows for this same specific foreign key value.
Because there may be a delay before the previous transaction is really commited, the SELECT clause may return a previous snapshot of the data. In fact, it does happen sometimes, and this is problematic. Of course the application may be redesigned but until then, I'm looking for a lock solution. Advisory Lock ?
I already solved the problem while performing UPDATE on specific rows, then using SELECT...FOR SHARE, and it works well. The SELECT waits until the transaction commits and return old and new rows.
Now I'm trying to solve it for INSERT.
SELECT...FOR SHARE does not block and return immediatley.
There is no concurrency issue here as only one client deals with a specific set of rows. I already know about MVCC.
Any help appreciated.
To wait for a not-yet-committed INSERT you'd need to take a predicate lock. There's limited predicate locking in PostgreSQL for the serializable support, but it's not exposed directly to the user.
Simple SERIALIZABLE isolation won't help you here, because SERIALIZABLE only requires that there be an order in which the transactions could've occurred to produce a consistent result. In your case this ordering is SELECT followed by INSERT.
The only option I can think of is to take an ACCESS EXCLUSIVE lock on the table before INSERTing. This will only get released at COMMIT PREPARED or ROLLBACK PREPARED time, and in the mean time any other queries will wait for the lock. You can enforce this via a BEFORE trigger to avoid the need to change the app. You'll probably get the odd deadlock and rollback if you do it that way, though, because INSERT will take a lower lock then you'll attempt lock promotion in the trigger. If possible it's better to run the LOCK TABLE ... IN ACCESS EXCLUSIVE MODE command before the INSERT.
As you've alluded to, this is mostly an application mis-design problem. Expecting to see not-yet-committed rows doesn't really make any sense.
I have a PostgreSQL 9.2.2 database that serves orders to my ERP system. The database tables contain boolean columns indicating if a customer is added or not among other records. The code I use extracts the rows from the database and sends them to our ERP system one at a time (single threaded). My code works perfectly in this regard; however over the past year our volume has grown enough to require a multi-threaded solution.
I don't think the MVCC modes will work for me because the added_customer column is only updated once a customer has been successfully added. The default MVCC modes could cause the same row to be worked on at the same time resulting in duplicate web service calls. What I want to avoid is duplicate web service calls to our ERP system as they can be rather heavy, although admittedly I am not an expert on MVCC nor the other modes that PostgreSQL provides.
My question is: How can I be sure that a row, or series of rows returned in one select statement are excluded from other queries to the database in separate threads?
You will need to record the fact that the rows are being processed somehow. You will also need to deal with concurrent attempts to mark them as being processed and handle failures with sending them to your ERP system.
You may find SELECT ... FOR UPDATE useful to get a set of rows and simultaneously lock them against updates. One approach might be for each thread to select a target row, try to add it's ID to a "processing" table, then remove it in the same transaction you update added_customer.
If a thread fetches no candidate rows, or fails to insert then it just needs to sleep briefly and try again. If anything goes badly wrong then you should have rows left in the "processing" table that you can inspect/correct.
Of course the other option is to just grab a set of candidate rows and spawn a separate process/thread for each that communicates with the ERP. That keeps the database fetching single-threaded while allowing multiple channels to the ERP.
You can add a column user_is_proccesed to the table. It can hold the process id for the back end, that updates the record.
Then use a small serializable transaction to set the user_is_proccesed to "lock row for proccesing".
Something like:
START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE user_table
SET user_is_proccesed = pg_backend_pid()
WHERE <some condition>
AND user_is_proccesed IS NULL; -- no one is proccesing it now
COMMIT;
The key thing here - with SERIALIZABLE only one transaction can successfully update the record (all other concurrent SERIALIZABLE updates will fail with ERROR: could not serialize access due to concurrent update).