Postgresql Update not updating, query finish successfully - postgresql

I'm trying to do this:
api-prod::DATABASE=> SELECT winner from factoring_bid WHERE id = 16184;
winner
--------
f
(1 row)
api-prod::DATABASE=> SELECT winner from factoring_bid WHERE id = 16184;
winner
--------
f
(1 row)
api-prod::DATABASE=> UPDATE factoring_bid set winner = 't' WHERE id=16184;
UPDATE 1
api-prod::DATABASE=> SELECT winner from factoring_bid WHERE id = 16184;
winner
--------
f
(1 row)
Update seems to work fine, but any change in this register (and some others) are not really happening. And yes, the user running this query has write permissions.
More info:
This is a Heroku postgresql database.
Updating some other registers on factoring_bid table works just fine.
factoring_bid table is related to another table called factoring_auction. All factoring_bid's related to the factoring_auction of the example have the same problem. Can't be updated.
factoring_bids related to other auctions have no problem. Maybe my application (a Django Rest Framework API) introduced some error in the bids of this specific auction?, I can't think of anything, but seems weird that they are related.
We don't use locks, there seems to be no lock running on the database currently.

Related

Group By expression in Oracle 19c

I am installing a database schema in Oracle 19c, and the installation scripts have been used repeatedly in Oracle 12 without problems.
My problem with 19c is when it runs our views script, it throws an on at some of views. The error we are seeing the not a group by expression.
We have a few views where for example we have something like this:
SELECT name, TRUNC(date) as Day
FROM sometable
GROUP BY name, TRUNC(date)
It is pointing the error at the select as though it doesn't see that the field is already in the group by expression.
As said, these queries work fine in Oracle 12 for years, it is only now when moving to 19 that we are seeing problems.
Is this a bug in 19c or does something need to be applied?
19c, you say? Can't reproduce it.
SQL> select banner from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Table you used (so that you wouldn't say that this is the culprit) (obviously, date is an invalid column name; that's reserved for the date datatype).
SQL> create table sometable as
2 select 'Little' name, sysdate datum
3 from dual
4 connect by level <= 3;
Table created.
Does query itself work? Yes:
SQL> select name, trunc(datum) as day
2 from sometable
3 group by name, trunc(datum);
NAME DAY
------ --------
Little 26.01.22
Note that - as you aren't aggregating anything - you could have used DISTINCT instead of GROUP BY:
SQL> select DISTINCT name, trunc(datum) as day
2 from sometable;
NAME DAY
------ --------
Little 26.01.22
Can I create a view? Yes:
SQL> create view v_sometable as
2 select name, trunc(datum) as day
3 from sometable
4 group by name, trunc(datum);
View created.
SQL>
As I said, I can't reproduce it.
Please, copy/paste your own SQL*Plus session (just like I did) so that we'd see what exactly you did and how Oracle responded.

Optimise a simple update query for large dataset

I have some data migration that has to occur between a parent and child table. For the sake of simplicity, the schemas are as follows:
------- -----------
| event | | parameter |
------- -----------
| id | | id |
| order | | eventId |
------- | order |
-----------
Because of an oversight with business logic that needs to be performed, we need to update parameter.order to the parent event.order. I have come up with the following SQL to do that:
UPDATE "parameter"
SET "order" = e."order"
FROM "event" e
WHERE "eventId" = e.id
The problem is that this query didn't resolve after over 4 hours and I had to clock out, so I cancelled it.
There are 11 million rows on parameter and 4 million rows on event. I've run EXPLAIN on the query and it tells me this:
Update on parameter (cost=706691.80..1706622.39 rows=11217313 width=155)
-> Hash Join (cost=706691.80..1706622.39 rows=11217313 width=155)
Hash Cond: (parameter."eventId" = e.id)
-> Seq Scan on parameter (cost=0.00..435684.13 rows=11217313 width=145)
-> Hash (cost=557324.91..557324.91 rows=7724791 width=26)
-> Seq Scan on event e (cost=0.00..557324.91 rows=7724791 width=26)
Based on this article it tells me that the "cost" referenced by the EXPLAIN is an "arbitrary unit of computation".
Ultimately, this update needs to be performed, but I would accept it happening in one of two ways:
I am advised of a better way to do this query that executes in a timely manner (I'm open to all suggestions, including updating schemas, indexing, etc.)
The query remains the same but I can somehow get an accurate prediction of execution time (even if it's hours long). This way, at least, I can manage the expectations of the team. I understand that without actually running the query it can't be expected to know the times, but is there an easy way to "convert" these arbitrary units into some millisecond execution time?
Edit for Jim Jones' comment:
I executed the following query:
SELECT psa.pid,locktype,mode,query,query_start,state FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid
I got 9 identical rows like the following:
pid | locktype | mode | query | query-start | state
-------------------------------------------------------------------------
23192 | relation | AccessShareLock | <see below> | 2021-10-26 14:10:01 | active
query column:
--update parameter
--set "order" = e."order"
--from "event" e
--where "eventId" = e.id
SELECT psa.pid,locktype,mode,query,query_start,state FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid
Edit 2: I think I've been stupid here... The query produced by checking these locks is just the commented query. I think that means there's actually nothing to report.
If some rows already have the target value, you can skip empty updates (at full cost). Like:
UPDATE parameter p
SET "order" = e."order"
FROM event e
WHERE p."eventId" = e.id
AND p."order" IS DISTINCT FROM e."order"; -- this
If both "order" columns are defined NOT NULL, simplify to:
...
AND p."order" <> e."order";
See:
How do I (or can I) SELECT DISTINCT on multiple columns?
If you have to update all or most rows - and can afford it! - writing a new table may be cheaper overall, like Mike already mentioned. But concurrency and depending objects may stand in the way.
Aside: use legal, lower-case identifiers, so you don't have to double-quote. Makes your life with Postgres easier.
The query will be slow because for each UPDATE operation, it has to look up the index by id. Even with an index, on a large table, this is a per-row read/write so it is slow.
I'm not sure how to get a good estimate, maybe do 1% of the table and multiply?
I suggest creating a new table, then dropping the old one and renaming the new table.
CREATE TABLE parameter_new AS
SELECT
parameter.id,
parameter."eventId",
e."order"
FROM
parameter
JOIN event AS "e" ON
"e".id = parameter."eventId"
Later, once you verify things:
ALTER TABLE parameter RENAME TO parameter_old;
ALTER TABLE parameter_new RENAME TO parameter;
Later, once you're completely certain:
DROP TABLE parameter_old;

Transaction Isolation Across Multiple Tables using PostgreSQL MVCC

Question Summary
This is a question about serializability of queries within a SQL transaction.
Specifically, I am using PostgreSQL. It may be assumed that I am using the most current version of PostgreSQL. From what I have read, I believe the technology used to support what I am trying to do is known as "MultiVersion Concurrency Control", or "MVCC".
To sum it up: If I have one primary table, and more-than-1 foreign-key-linked table connected to that primary table, how do I guarantee that, for a given key in the tables, and any number of SELECT statements using that key inside one transaction, each of which is SELECTing from any of the linked tables, I will get data as it existed at the time I started the transaction?
Other Questions
This question is similar, but broader, and the question and answer did not relate specifically to PostgreSQL:
Transaction isolation and reading from multiple tables on SQL Server Express and SQL Server 2005
Example
Let's say I have 3 tables:
bricks
brickworks (primary key)
completion_time (primary key)
has_been_sold
brick_colors
brickworks (primary key, foreign key pointing to "bricks")
completion_time (primary key, foreign key pointing to "bricks")
quadrant (primary key)
color
brick_weight
brickworks (primary key, foreign key pointing to "bricks")
completion_time (primary key, foreign key pointing to "bricks")
weight
A brickworks produces one brick at a time. It makes bricks that may be of different colors in each of its 4 quadrants.
Someone later analyzes the bricks to determine their color combination, and writes the results to the brick_colors table.
Someone else analyzes the bricks to determine their weight, and writes the results to the brick_weight table.
At any given time, an existing brick may or may not have a recorded color, and may or may not have a recorded weight.
An application exists, and this application receives word that someone wants to buy a particular brick (already known at this point to the application by its brickworks/completion_time composite key).
The application wants to select all known properties of the brick AT THE EXACT TIME IT STARTS THE QUERY.
If color or weight information is added MID-TRANSACTION, the application does NOT want to know about it.
The application wants to perform SEPARATE QUERIES (not a SELECT with multiple JOINs to the foreign-key-linked tables, which might return multiple rows because of the brick_colors table).
This example is deliberately simple; the desire to do this without one SELECT with multiple JOINs would be clearer if my example included, say, 10 foreign-key-linked tables, and many or all of them could return multiple rows for the same primary key (like brick_colors does in the example as I have it above).
Attempted Solution
Here's what I've come up with so far:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY ;
-- All this statement accomplishes is telling the database what rows should be returned from the present point-in-time in future queries within the transaction
SELECT DISTINCT true
FROM bricks b
LEFT JOIN brick_colors bc ON bc.brickworks = b.brickworks AND bc.completion_time = b.completion_time
LEFT JOIN brick_weight bw ON bw.brickworks = b.brickworks AND bw.completion_time = b.completion_time
WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
SELECT * FROM brick_colors WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
SELECT * FROM brick_weight WHERE b.brickworks = 'Brick-o-Matic' AND b.completion_time = '2017-02-01T07:35:00.000Z' ;
COMMIT ;
It just seems wasteful to use that first SELECT with the JOINs solely for purposes of ensuring serializability.
Is there any other way to do this?
References
PostgreSQL Concurrency Control
PostgreSQL Transcation Isolation
PostgreSQL SET TRANSACTION statement
This is the essence of your question:
how do I guarantee that, for ...... any number of SELECT statements
..... inside one transaction ....... I will get data as it existed at
the time I started the transaction?
This is exactly what Repeatable Read Isolation Level guarantees:
The Repeatable Read isolation level only sees data committed before
the transaction began; it never sees either uncommitted data or
changes committed during transaction execution by concurrent
transactions. (However, the query does see the effects of previous
updates executed within its own transaction, even though they are not
yet committed.) This is a stronger guarantee than is required by the
SQL standard for this isolation level, and prevents all of the
phenomena described in Table 13-1. As mentioned above, this is
specifically allowed by the standard, which only describes the minimum
protections each isolation level must provide.
This level is different from Read Committed in that a query in a
repeatable read transaction sees a snapshot as of the start of the
transaction, not as of the start of the current query within the
transaction. Thus, successive SELECT commands within a single
transaction see the same data, i.e., they do not see changes made by
other transactions that committed after their own transaction started.
A practical example - let say we have 2 simple tables:
CREATE TABLE t1( x int );
INSERT INTO t1 VALUES (1),(2),(3);
CREATE TABLE t2( y int );
INSERT INTO t2 VALUES (1),(2),(3);
A number of tables, their structures, primary keys, foreign keys etc. are unimportant here.
Lets open a first session, start repeatable read isolation level, and run two simple and separate SELECT statements:
test=# START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION
test=# SELECT * FROM t1;
x
---
1
2
3
(3 wiersze)
test=# SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)
Note that START TRANSACTION command automatically disables autocommit mode in the session.
Now in another session (with default autocommit mode enabled)insert a few records into t1:
test2=# INSERT INTO t1 VALUES(10),(11);
New values were inserded and automatically commited (because autocommit is on).
Now go back to the first session and run SELECT again:
test=# select * from t1;
x
---
1
2
3
(3 wiersze)
As you see, session1 (with active repeatable read transaction) doesn't see any changes commited after the start of the transation.
Lets do the same experiment whit table t2 - go to the second session and issue:
test2=# DELETE FROM t2 WHERE y = 2;
DELETE 1
Now go back to the first session and run SELECT again:
test=# SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)
As you see, again, session1 (with active repeatable read transaction) doesn't see any changes commited after the start of the transation.
And now, in session1, finish the transaction issuing COMMIT, and then SELECT:
test=# SELECT * FROM t1;
x
---
1
2
3
(3 wiersze)
test=# SELECT * FROM t2;
y
---
1
2
3
(3 wiersze)
test=# COMMIT;
COMMIT
test=# select * from t1;
x
----
1
2
3
10
11
(5 wierszy)
test=# select * from t2;
y
---
1
3
(2 wiersze)
As you see, when the repeatable read transaction is started and active, you can run many separate select statement multiple times, and all of these select statements see the same stable snapshot of data as of the start of the transaction, regardles of any commited data in other sessions.

Cannot kill a Postgres query

So, this is a problem I've had a few times – where I'll accidentally do a SELECT * from a giant database. Normally, I just go in, get the pid of the query (SELECT * FROM pg_stat_activity), and then do a SELECT pg_cancel_backend(PID here), and voilà, it ends. But sometimes – especially with queries that will eventually yield a ridiculous number of rows – it just returns this:
db=# select pg_cancel_backend(5246);
pg_cancel_backend
-------------------
t
(1 row)
...and the query lives on! How do I kill these things??
I would try pg_terminate_backend(pid);

lock the rows until next select postgres

Is there a way in postgres to lock the rows until the next select query execution from the same system.And one more thing is there will be no update process on locked rows.
scenario is something like this
If the table1 contains data like
id | txt
-------------------
1 | World
2 | Text
3 | Crawler
4 | Solution
5 | Nation
6 | Under
7 | Padding
8 | Settle
9 | Begin
10 | Large
11 | Someone
12 | Dance
If sys1 executes
select * from table1 order by id limit 5;
then it should lock row from id 1 to 5 for other system which are executing select statement concurrently.
Later if sys1 again execute another select query like
select * from table1 where id>10 order by id limit 5;
then pereviously locked rows should be released.
I don't think this is possible. You cannot block a read only access to a table (unless that select is done FOR UPDATE)
As far as I can tell, the only chance you have is to use the pg_advisory_lock() function.
http://www.postgresql.org/docs/current/static/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS
But this requires a "manual" release of the locks obtained through it. You won't get an automatic unlocking with that.
To lock the rows you would need something like this:
select pg_advisory_lock(id), *
from
(
select * table1 order by id limit 5
) t
(Note the use of the derived table for the LIMIT part. See the manual link I posted for an explanation)
Then you need to store the retrieved IDs and later call pg_advisory_unlock() for each ID.
If each process is always releasing all IDs at once, you could simply use pg_advisory_unlock_all() instead. Then you will not need to store the retrieved IDs.
Note that this will not prevent others from reading the rows using "normal" selects. It will only work if every process that accesses that table uses the same pattern of obtaining the locks.
It looks like you really have a transaction which transcends the borders of your database, and all the change happens in an another system.
My idea is select ... for update no wait to lock the relevant rows, then offload the data into another system, then rollback to unlock the rows. No two select ... for update queries will select the same row, and the second select will fail immediately rather than wait and proceed.
But you don't seem to mark offloaded records in any way; I don't see why two non-consecutive selects won't happily select overlapping range. So I'd still update the records with a flag and/or a target user name and would only select records with the flag unset.
I tried both select...for update and pg_try_advisory_lock and managed to get near my requirement.
/*rows are locking but limit is the problem*/
select * from table1 where pg_try_advisory_lock( id) limit 5;
.
.
$_SESSION['rows'] = $rowcount; // no of row to process
.
.
/*afer each process of word*/
$_SESSION['rows'] -=1;
.
.
/*and finally unlock locked rows*/
if ($_SESSION['rows']===0)
select pg_advisory_unlock_all() from table1
But there are two problem in this
1. As Limit will apply before lock, every time the same rows are trying to lock in different instance.
2. Not sure whether pg_advisory_unlock_all will unlock the rows locked by current instance or all the instance.