Locks on updating rows with foreign key constraint - postgresql

I tried executing the same UPDATE query twice like below.
First time the transaction has no lock but I can see a row lock after second query.
Schema:
test=# \d t1
Table "public.t1"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
i | integer | | not null |
j | integer | | |
Indexes:
"t1_pkey" PRIMARY KEY, btree (i)
Referenced by:
TABLE "t2" CONSTRAINT "t2_j_fkey" FOREIGN KEY (j) REFERENCES t1(i)
test=# \d t2
Table "public.t2"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
i | integer | | not null |
j | integer | | |
k | integer | | |
Indexes:
"t2_pkey" PRIMARY KEY, btree (i)
Foreign-key constraints:
"t2_j_fkey" FOREIGN KEY (j) REFERENCES t1(i)
Existing data:
test=# SELECT * FROM t1 ORDER BY i;
i | j
---+---
1 | 1
2 | 2
(2 rows)
test=# SELECT * FROM t2 ORDER BY i;
i | j | k
---+---+---
3 | 1 |
4 | 2 |
(2 rows)
UPDATE queries and row lock status:
test=# BEGIN;
BEGIN
test=# UPDATE t2 SET k = 123 WHERE i = 3;
UPDATE 1
test=# SELECT * FROM t1 AS t, pgrowlocks('t1') AS p WHERE p.locked_row = t.ctid;
i | j | locked_row | locker | multi | xids | modes | pids
---+---+------------+--------+-------+------+-------+------
(0 rows)
test=# UPDATE t2 SET k = 123 WHERE i = 3;
UPDATE 1
test=# SELECT * FROM t1 AS t, pgrowlocks('t1') AS p WHERE p.locked_row = t.ctid;
i | j | locked_row | locker | multi | xids | modes | pids
---+---+------------+--------+-------+----------+-------------------+------
1 | 1 | (0,1) | 107239 | f | {107239} | {"For Key Share"} | {76}
(1 row)
test=#
Why does postgres try to get a row lock only on second time?
By the way, queries updating column t2.j create new lock (ForKeyShare) on t1 row at once. This behavior make sense because t2.j has foreign key constraint references t1.i. But the queries above seems not.
Does anyone can explain this lock?
PostgreSQL version: 9.6.3

Okay, I got it.
http://blog.nordeus.com/dev-ops/postgresql-locking-revealed.htm
This is optimization that exists in Postgres. If locking manager can figure out from the first query that foreign key is not changed (it is not mentioned in update query or is set to same value) it will not lock parent table. But in second query it will behave as it is described in documentation (it will lock parent table in ROW SHARE locking mode and referenced row in FOR SHARE mode)
It seems MySQL is wiser about foreign key locks because the same UPDATE query doesn't make such locks on MySQL.

Related

unique constraint values in postgres

I applied over a postgres USER table and unique contraint over email. The problem that I am facing now is that the constraint seems to register each value I insert (or try to insert) no matter if a record with that value exists or not.
I.e
Table:
id
user
1
mail#gmail.com
2
mail2#gmail.com
if i insert mail3#gmail.com, delete the value and try to insert mail3#gmail.com again it says:
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "email"
my doubt is: the unique constraint guarantees that the value is newer always or that there is only one record of that value in the column?
documentations says it the second but the experience shows is the first one
more details:
| Column | Type | Nullable |
|------------------|-----------------------------|----------|
| id | integer | not null |
| email | character varying(100) | |
| password | character varying(100) | |
| name | character varying(1000) | |
| lastname | character varying(1000) | |
| dni | character varying(20) | |
| cellphone | character varying(20) | |
| accepted_terms | boolean | |
| investor_test | boolean | |
| validated_email | boolean | |
| validated_cel | boolean | |
| last_login_at | timestamp without time zone | |
| current_login_at | timestamp without time zone | |
| last_login_ip | character varying(100) | |
| current_login_ip | character varying(100) | |
| login_count | integer | |
| active | boolean | |
| fs_uniquifier | character varying(255) | not null |
| confirmed_at | timestamp without time zone | |
Indexes:
"bondusers_pkey" PRIMARY KEY, btree (id)
"bondusers_email_key" UNIQUE CONSTRAINT, btree (email)
"bondusers_fs_uniquifier_key" UNIQUE CONSTRAINT, btree (fs_uniquifier)
Insert Statement:
INSERT INTO bondusers (email, password, name, lastname, dni, cellphone, accepted_terms, investor_test, validated_email, validated_cel, last_login_at, current_login_at, last_login_ip, current_login_ip, login_count, active, fs_uniquifier, confirmed_at) VALUES ('mail3#gmail.com', '$pbkdf2-sha256$29000$XyvlfI8x5vwfYwyhtBYi5A$Hhfrzvqs94MjTCmDOVmmnbUyf7ho4kLEY8UYUCdHPgM', 'mail', 'mail3', '123123123', '1139199196', false, false, false, false, NULL, NULL, NULL, NULL, NULL, true, '1c4e60b34a5641f4b560f8fd1d45872c', NULL);
ERROR: duplicate key value violates unique constraint "bondusers_fs_uniquifier_key"
DETAIL: Key (fs_uniquifier)=(1c4e60b34a5641f4b560f8fd1d45872c) already exists.
but when:
select * from bondusers where fs_uniquifier = '1c4e60b34a5641f4b560f8fd1d45872c';
result is 0 rows
I assume that if you run the INSERT, DELETE, INSERT directly within Postgres command line it works OK?
I noticed your error references SQLAlchemy (sqlalchemy.exc.IntegrityError), so I think it may be that and not PostgreSQL. Within a transaction SQLAlchemy's Unit of Work pattern can re-order SQL statements for performance reasons.
The only ref I could find was here https://github.com/sqlalchemy/sqlalchemy/issues/5735#issuecomment-735939061 :
if there are no dependency cycles between the target tables, the flush proceeds as follows:
<snip/>
a. within a particular table, INSERT operations are processed in the order in which objects were add()'ed
b. within a particular table, UPDATE and DELETE operations are processed in primary key order
So if you have the following within a single transaction:
INSERT x
DELETE x
INSERT x
when you commit it, it's probably getting reordered as:
INSERT x
INSERT x
DELETE x
I have more experience with this problem in Java/hibernate. The SQLAlchemy docs do claim it's unit of work pattern is "Modeled after Fowler's "Unit of Work" pattern as well as Hibernate, Java's leading object-relational mapper." so probably relevant here too
To supplement Ed Brook's insightful answer, you can work around the problem by flushing the session after deleting the record:
with Session() as s, s.begin():
u = s.scalars(sa.select(User).where(User.user == 'a')).first()
s.delete(u)
s.flush()
s.add(User(user='a'))
Another solution would be to use a deferred constraint, so that the state of the index is not evaluated until the end of the transaction:
class User(Base):
...
__table_args__ = (
sa.UniqueConstraint('user', deferrable=True, initially='deferred'),
)
but note, from the PostgreSQL documentation:
deferrable constraints cannot be used as conflict arbitrators in an INSERT statement that includes an ON CONFLICT DO UPDATE clause.

UPDATE from temp table picking the "last" row per group

Suppose there is a table with data:
+----+-------+
| id | value |
+----+-------+
| 1 | 0 |
| 2 | 0 |
+----+-------+
I need to do a bulk update. And use COPY FROM STDIN for fast insert to temp table without constraints and so it can contains duplicate values in id column
Temp table to update from:
+----+-------+
| id | value |
+----+-------+
| 1 | 1 |
| 2 | 1 |
| 1 | 2 |
| 2 | 2 |
+----+-------+
If I simply run a query like with:
UPDATE test target SET value = source.value FROM tmp_test source WHERE target.id = source.id;
I got wrong results:
+----+-------+
| id | value |
+----+-------+
| 1 | 1 |
| 2 | 1 |
+----+-------+
I need the target table to contain the values that appeared last in the temporary table.
What is the most effective way to do this, given that the target table may contain millions of records, and the temporary table may contain tens of thousands?**
Assuming you want to take the value from the row that was inserted last into the temp table, physically, you can (ab-)use the system column ctid, signifying the physical location:
UPDATE test AS target
SET value = source.value
FROM (
SELECT DISTINCT ON (id)
id, value
FROM tmp_test
ORDER BY id, ctid DESC
) source
WHERE target.id = source.id
AND target.value <> source.value; -- skip empty updates
About DISTINCT ON:
Select first row in each GROUP BY group?
This builds on a implementation detail, and is not backed up by the SQL standard. If some insert method should not write rows in sequence (like future "parallel" INSERT), it breaks. Currently, it should work. About ctid:
How do I decompose ctid into page and row numbers?
If you want a safe way, you need to add some user column to signify the order of rows, like a serial column. But do your really care? Your tiebreaker seems rather arbitrary. See:
Temporary sequence within a SELECT
AND target.value <> source.value
skips empty updates - assuming both columns are NOT NULL. Else, use:
AND target.value IS DISTINCT FROM source.value
See:
How do I (or can I) SELECT DISTINCT on multiple columns?

How to understand/debug bad index result on slave instance

I have the following schema and 2 postgresql instances with a slave instance replicating a master instance.
CREATE TABLE t (id serial PRIMARY KEY, c text);
CREATE INDEX ON t (upper(c));
I get this incorrect result on the slave instance.
# SELECT id, c, upper(c), upper(c) = upper('FOO') FROM t WHERE id IN (123, 456);
id | c | upper | ?column?
-----+-----+-------+----------
123 | Foo | FOO | t
456 | foo | FOO | t
(2 rows)
# SELECT id, c, upper(c), upper(c) = upper('FOO') FROM t WHERE upper(c) = upper('FOO');
id | c | upper | ?column?
----+---+-------+----------
(0 rows)
The second query should return the same rows as the first query.
However, the result is correct on the master instance.
# SELECT id, c, upper(c), upper(c) = upper('FOO') FROM t WHERE id IN (123, 456);
id | c | upper | ?column?
-----+-----+-------+----------
123 | Foo | FOO | t
456 | foo | FOO | t
(2 rows)
# SELECT id, c, upper(c), upper(c) = upper('FOO') FROM t WHERE upper(c) = upper('FOO');
id | c | upper | ?column?
-----+-----+-------+----------
123 | Foo | FOO | t
456 | foo | FOO | t
(2 rows)
Using EXPLAIN on the second query, I can see that it's using the index as expected, so I suspect the index data is somehow incorrect on the slave instance. Doing a REINDEX on the master instance does not resolve the issue and doing it on the slave instance is not possible because of the replication.
Is it possible that the index data is correct on the master instance and incorrect on the slave instance? How to further debug the issue?
UPDATE: This is the query plan of the second query on both the master and the slave instance
Index Scan using t_upper_idx on t (cost=0.43..8.46 rows=1 width=60)
Index Cond: (upper((c)::text) = 'FOO'::text)
There are ~3M rows in the t table.
UPDATE: Server version is server 11.4 (Debian 11.4-1.pgdg90+1)) on the master, and server 11.7 (Debian 11.7-0+deb10u1)) on the slave.

How to order rows with linked parts in PostgreSQL

I have a table A with columns: id, title, condition
And i have another table B with information about position for some rows from table A. Table B have columns id, next_id, prev_id
How to sort rows from A based on information from table B?
For example,
Table A
id| title
---+-----
1 | title1
2 | title2
3 | title3
4 | title4
5 | title5
Table B
id| next_id | prev_id
---+-----
2 | 1 | null
5 | 4 | 3
I want to get this result:
id| title
---+-----
2 | title2
1 | title1
3 | title3
5 | title5
4 | title4
And after apply this sort, i want to sort by condition column yet.
I've already spent a lot of time looking for a solution, and hope for your help.
You have to add weights to your data, so you can order accordingly. This example uses next_id, not sure if you need to use prev_id, you don't explain the use of it.
Anyway, here's a code example:
-- Temporal Data for the test:
CREATE TEMP TABLE table_a(id integer,tittle text);
CREATE TEMP TABLE table_b(id integer,next_id integer, prev_id integer);
INSERT INTO table_a VALUES
(1,'title1'),
(2,'title2'),
(3,'title3'),
(4,'title4'),
(5,'title5');
INSERT INTO table_b VALUES
(2,1,null),
(5,4,3);
-- QUERY:
SELECT
id,tittle,
CASE -- Adding weight
WHEN next_id IS NULL THEN (id + 0.1)
ELSE next_id
END AS orden
FROM -- Joining tables
(SELECT ta.*,tb.next_id
FROM table_a ta
LEFT JOIN table_b tb
ON ta.id=tb.id)join_a_b
ORDER BY orden
And here's the result:
id | tittle | orden
--------------------------
2 | title2 | 1
1 | title1 | 1.1
3 | title3 | 3.1
5 | title5 | 4
4 | title4 | 4.1

Update intermediate result

EDIT
As requested a little background of what I want to achieve. I have a table that I want to query but I don't want to change the table itself. Next the result of the SELECT query (what I called the 'intermediate table') needs to be cleaned a bit. For example certain cells of certain rows need to be swapped and some strings need to be trimmed. Of course this could all be done as postprocessing in, e.g., Python, but I was hoping to do all of this with one query statement.
Being new to Postgresql I want to update the intermediate table that results from a SELECT statement. So I basically want to edit the resulting table from a SELECT statement in one query. I'd like to prevent having to store the intermediate result.
I've tried the following 'with clause':
with result as (
select
a
from
b
)
update result as r
set
a = 'd'
...but that results in ERROR: relation "result" does not exist, while the following does work:
with result as (
select
a
from
b
)
select
*
from
result
As I said, I'm new to Postgresql so it is entirely possible that I'm using the wrong approach.
Depending on the complexity of the transformations you want to perform, you might be able to munge it into the SELECT, which would let you get away with a single query:
WITH foo AS (SELECT lower(name), freq, cumfreq, rank, vec FROM names WHERE name LIKE 'G%')
SELECT ... FROM foo WHERE ...
Or, for more or less unlimited manipulation options, you could create a temp table that will disappear at the end of the current transaction. That doesn't get the job done in a single query, but it does get it all done on the SQL server, which might still be worthwhile.
db=# BEGIN;
BEGIN
db=# CREATE TEMP TABLE foo ON COMMIT DROP AS SELECT * FROM names WHERE name LIKE 'G%';
SELECT 4677
db=# SELECT * FROM foo LIMIT 5;
name | freq | cumfreq | rank | vec
----------+-------+---------+------+-----------------------
GREEN | 0.183 | 11.403 | 35 | 'KRN':1 'green':1
GONZALEZ | 0.166 | 11.915 | 38 | 'KNSL':1 'gonzalez':1
GRAY | 0.106 | 15.921 | 69 | 'KR':1 'gray':1
GONZALES | 0.087 | 18.318 | 94 | 'KNSL':1 'gonzales':1
GRIFFIN | 0.084 | 18.659 | 98 | 'KRFN':1 'griffin':1
(5 rows)
db=# UPDATE foo SET name = lower(name);
UPDATE 4677
db=# SELECT * FROM foo LIMIT 5;
name | freq | cumfreq | rank | vec
--------+-------+---------+-------+---------------------
grube | 0.002 | 67.691 | 7333 | 'KRP':1 'grube':1
gasper | 0.001 | 69.999 | 9027 | 'KSPR':1 'gasper':1
gori | 0.000 | 81.360 | 28946 | 'KR':1 'gori':1
goeltz | 0.000 | 85.471 | 47269 | 'KLTS':1 'goeltz':1
gani | 0.000 | 86.202 | 51743 | 'KN':1 'gani':1
(5 rows)
db=# COMMIT;
COMMIT
db=# SELECT * FROM foo;
ERROR: relation "foo" does not exist