How to understand/debug bad index result on slave instance - postgresql

I have the following schema and 2 postgresql instances with a slave instance replicating a master instance.
CREATE TABLE t (id serial PRIMARY KEY, c text);
CREATE INDEX ON t (upper(c));
I get this incorrect result on the slave instance.
# SELECT id, c, upper(c), upper(c) = upper('FOO') FROM t WHERE id IN (123, 456);
id | c | upper | ?column?
-----+-----+-------+----------
123 | Foo | FOO | t
456 | foo | FOO | t
(2 rows)
# SELECT id, c, upper(c), upper(c) = upper('FOO') FROM t WHERE upper(c) = upper('FOO');
id | c | upper | ?column?
----+---+-------+----------
(0 rows)
The second query should return the same rows as the first query.
However, the result is correct on the master instance.
# SELECT id, c, upper(c), upper(c) = upper('FOO') FROM t WHERE id IN (123, 456);
id | c | upper | ?column?
-----+-----+-------+----------
123 | Foo | FOO | t
456 | foo | FOO | t
(2 rows)
# SELECT id, c, upper(c), upper(c) = upper('FOO') FROM t WHERE upper(c) = upper('FOO');
id | c | upper | ?column?
-----+-----+-------+----------
123 | Foo | FOO | t
456 | foo | FOO | t
(2 rows)
Using EXPLAIN on the second query, I can see that it's using the index as expected, so I suspect the index data is somehow incorrect on the slave instance. Doing a REINDEX on the master instance does not resolve the issue and doing it on the slave instance is not possible because of the replication.
Is it possible that the index data is correct on the master instance and incorrect on the slave instance? How to further debug the issue?
UPDATE: This is the query plan of the second query on both the master and the slave instance
Index Scan using t_upper_idx on t (cost=0.43..8.46 rows=1 width=60)
Index Cond: (upper((c)::text) = 'FOO'::text)
There are ~3M rows in the t table.
UPDATE: Server version is server 11.4 (Debian 11.4-1.pgdg90+1)) on the master, and server 11.7 (Debian 11.7-0+deb10u1)) on the slave.

Related

Postgres - Same query returns different results when run again and again

While experimenting PostgreSQL for my new project, I have experienced the following query execution behavior. I would like to know why the same query returns different values when run again and again? What is the rational behind this behavior?
phi=# SELECT P.id, p.resource ->> 'birthDate' BD FROM recorditems P WHERE P.resource #> '{"resourceType":"Patient", "gender":"male"}' AND To_date(P.resource ->> 'birthDate', 'YYYY-MM-DD') > '1975-01-01'::date limit 10;
id | bd
--------+------------
363661 | 1990-03-08
363752 | 2006-02-28
364971 | 2017-10-21
365330 | 1996-11-25
367793 | 2007-10-02
369002 | 2006-09-04
369172 | 1983-09-10
369256 | 2001-05-19
369670 | 1992-03-21
372082 | 2011-07-27
(10 rows)
Time: 15.085 ms
phi=# SELECT P.id, p.resource ->> 'birthDate' BD FROM recorditems P WHERE P.resource #> '{"resourceType":"Patient", "gender":"male"}' AND To_date(P.resource ->> 'birthDate', 'YYYY-MM-DD') > '1975-01-01'::date limit 10;
id | bd
--------+------------
372082 | 2011-07-27
372645 | 1988-11-02
373528 | 1984-07-11
376213 | 1982-01-03
377386 | 1995-01-20
377531 | 2002-02-11
377717 | 1991-11-15
378372 | 2018-09-27
378483 | 2009-01-11
378743 | 1996-02-27
(10 rows)
Time: 18.163 ms
phi=# SELECT P.id, p.resource ->> 'birthDate' BD FROM recorditems P WHERE P.resource #> '{"resourceType":"Patient", "gender":"male"}' AND To_date(P.resource ->> 'birthDate', 'YYYY-MM-DD') > '1975-01-01'::date limit 10;
id | bd
--------+------------
378743 | 1996-02-27
382517 | 1992-01-14
387866 | 1985-07-03
388180 | 1976-11-01
388627 | 1996-07-10
396668 | 1979-03-29
396754 | 2013-05-16
397054 | 1998-01-05
401771 | 1983-11-28
401891 | 2019-03-01
(10 rows)
Time: 44.394 ms
You are not sorting your results. A LIMIT without an ORDER BY will return (seemingly) random results as the rows of a table are not sorted.
If you want to get consistent results using LIMIT you have to use an ORDER BY as well.
When you have multiple sequential scans going against the same table at more or less the same time (loosely defined), PostgreSQL tries to synchronize them so that they can all benefit from the same warmed-up cache. If you turn off this feature with set synchronize_seqscans TO off, you will probably get more predictable results. But your original sin here is expecting an order, when you didn't request one.

Locks on updating rows with foreign key constraint

I tried executing the same UPDATE query twice like below.
First time the transaction has no lock but I can see a row lock after second query.
Schema:
test=# \d t1
Table "public.t1"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
i | integer | | not null |
j | integer | | |
Indexes:
"t1_pkey" PRIMARY KEY, btree (i)
Referenced by:
TABLE "t2" CONSTRAINT "t2_j_fkey" FOREIGN KEY (j) REFERENCES t1(i)
test=# \d t2
Table "public.t2"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
i | integer | | not null |
j | integer | | |
k | integer | | |
Indexes:
"t2_pkey" PRIMARY KEY, btree (i)
Foreign-key constraints:
"t2_j_fkey" FOREIGN KEY (j) REFERENCES t1(i)
Existing data:
test=# SELECT * FROM t1 ORDER BY i;
i | j
---+---
1 | 1
2 | 2
(2 rows)
test=# SELECT * FROM t2 ORDER BY i;
i | j | k
---+---+---
3 | 1 |
4 | 2 |
(2 rows)
UPDATE queries and row lock status:
test=# BEGIN;
BEGIN
test=# UPDATE t2 SET k = 123 WHERE i = 3;
UPDATE 1
test=# SELECT * FROM t1 AS t, pgrowlocks('t1') AS p WHERE p.locked_row = t.ctid;
i | j | locked_row | locker | multi | xids | modes | pids
---+---+------------+--------+-------+------+-------+------
(0 rows)
test=# UPDATE t2 SET k = 123 WHERE i = 3;
UPDATE 1
test=# SELECT * FROM t1 AS t, pgrowlocks('t1') AS p WHERE p.locked_row = t.ctid;
i | j | locked_row | locker | multi | xids | modes | pids
---+---+------------+--------+-------+----------+-------------------+------
1 | 1 | (0,1) | 107239 | f | {107239} | {"For Key Share"} | {76}
(1 row)
test=#
Why does postgres try to get a row lock only on second time?
By the way, queries updating column t2.j create new lock (ForKeyShare) on t1 row at once. This behavior make sense because t2.j has foreign key constraint references t1.i. But the queries above seems not.
Does anyone can explain this lock?
PostgreSQL version: 9.6.3
Okay, I got it.
http://blog.nordeus.com/dev-ops/postgresql-locking-revealed.htm
This is optimization that exists in Postgres. If locking manager can figure out from the first query that foreign key is not changed (it is not mentioned in update query or is set to same value) it will not lock parent table. But in second query it will behave as it is described in documentation (it will lock parent table in ROW SHARE locking mode and referenced row in FOR SHARE mode)
It seems MySQL is wiser about foreign key locks because the same UPDATE query doesn't make such locks on MySQL.

PostgreSQL function for each row

I have a query which return to me a table like that:
id | VAL | Type
-- | --- | ----
1 | 10 | A
2 | 20 | B
3 | 30 | C
4 | 40 | B
I want to call some function for each row, which will check type and do some stuff for each type, like IF in C#:
if(type=='A'){}
if(type=='B'){}
if(type=='C'){}
How can I make these 2 things in Postgresql using only sql?
In standard SQL you can use a CASE phrase here:
SELECT id, val, "type",
CASE "type"
WHEN 'A' THEN funcA()
WHEN 'B' THEN funcB()
WHEN 'C' THEN funcC()
END AS func_result,
FROM <table>;
All functions should return a scalar value (a single value).

Update intermediate result

EDIT
As requested a little background of what I want to achieve. I have a table that I want to query but I don't want to change the table itself. Next the result of the SELECT query (what I called the 'intermediate table') needs to be cleaned a bit. For example certain cells of certain rows need to be swapped and some strings need to be trimmed. Of course this could all be done as postprocessing in, e.g., Python, but I was hoping to do all of this with one query statement.
Being new to Postgresql I want to update the intermediate table that results from a SELECT statement. So I basically want to edit the resulting table from a SELECT statement in one query. I'd like to prevent having to store the intermediate result.
I've tried the following 'with clause':
with result as (
select
a
from
b
)
update result as r
set
a = 'd'
...but that results in ERROR: relation "result" does not exist, while the following does work:
with result as (
select
a
from
b
)
select
*
from
result
As I said, I'm new to Postgresql so it is entirely possible that I'm using the wrong approach.
Depending on the complexity of the transformations you want to perform, you might be able to munge it into the SELECT, which would let you get away with a single query:
WITH foo AS (SELECT lower(name), freq, cumfreq, rank, vec FROM names WHERE name LIKE 'G%')
SELECT ... FROM foo WHERE ...
Or, for more or less unlimited manipulation options, you could create a temp table that will disappear at the end of the current transaction. That doesn't get the job done in a single query, but it does get it all done on the SQL server, which might still be worthwhile.
db=# BEGIN;
BEGIN
db=# CREATE TEMP TABLE foo ON COMMIT DROP AS SELECT * FROM names WHERE name LIKE 'G%';
SELECT 4677
db=# SELECT * FROM foo LIMIT 5;
name | freq | cumfreq | rank | vec
----------+-------+---------+------+-----------------------
GREEN | 0.183 | 11.403 | 35 | 'KRN':1 'green':1
GONZALEZ | 0.166 | 11.915 | 38 | 'KNSL':1 'gonzalez':1
GRAY | 0.106 | 15.921 | 69 | 'KR':1 'gray':1
GONZALES | 0.087 | 18.318 | 94 | 'KNSL':1 'gonzales':1
GRIFFIN | 0.084 | 18.659 | 98 | 'KRFN':1 'griffin':1
(5 rows)
db=# UPDATE foo SET name = lower(name);
UPDATE 4677
db=# SELECT * FROM foo LIMIT 5;
name | freq | cumfreq | rank | vec
--------+-------+---------+-------+---------------------
grube | 0.002 | 67.691 | 7333 | 'KRP':1 'grube':1
gasper | 0.001 | 69.999 | 9027 | 'KSPR':1 'gasper':1
gori | 0.000 | 81.360 | 28946 | 'KR':1 'gori':1
goeltz | 0.000 | 85.471 | 47269 | 'KLTS':1 'goeltz':1
gani | 0.000 | 86.202 | 51743 | 'KN':1 'gani':1
(5 rows)
db=# COMMIT;
COMMIT
db=# SELECT * FROM foo;
ERROR: relation "foo" does not exist

Query join result appears to be incorrect

I have no idea what's going on here. Maybe I've been staring at this code for too long.
The query I have is as follows:
CREATE VIEW v_sku_best_before AS
SELECT
sw.sku_id,
sw.sku_warehouse_id "A",
sbb.sku_warehouse_id "B",
sbb.best_before,
sbb.quantity
FROM SKU_WAREHOUSE sw
LEFT OUTER JOIN SKU_BEST_BEFORE sbb
ON sbb.sku_warehouse_id = sw.warehouse_id
ORDER BY sbb.best_before
I can post the table definitions if that helps, but I'm not sure it will. Suffice to say that SKU_WAREHOUSE.sku_warehouse_id is an identity column, and SKU_BEST_BEFORE.sku_warehouse_id is a child that uses that identity as a foreign key.
Here's the result when I run the query:
+--------+-----+----+-------------+----------+
| sku_id | A | B | best_before | quantity |
+--------+-----+----+-------------+----------+
| 20251 | 643 | 11 | <<null>> | 140 |
+--------+-----+----+-------------+----------+
(1 row)
The join specifies that the sku_warehouse_id columns have to be equal, but when I pull the ID from each table (labelled as A and B) they're different.
What am I doing wrong?
Perhaps just sw.sku_warehouse_id instead of sw.warehouse_id?