Postgres - Same query returns different results when run again and again - postgresql

While experimenting PostgreSQL for my new project, I have experienced the following query execution behavior. I would like to know why the same query returns different values when run again and again? What is the rational behind this behavior?
phi=# SELECT P.id, p.resource ->> 'birthDate' BD FROM recorditems P WHERE P.resource #> '{"resourceType":"Patient", "gender":"male"}' AND To_date(P.resource ->> 'birthDate', 'YYYY-MM-DD') > '1975-01-01'::date limit 10;
id | bd
--------+------------
363661 | 1990-03-08
363752 | 2006-02-28
364971 | 2017-10-21
365330 | 1996-11-25
367793 | 2007-10-02
369002 | 2006-09-04
369172 | 1983-09-10
369256 | 2001-05-19
369670 | 1992-03-21
372082 | 2011-07-27
(10 rows)
Time: 15.085 ms
phi=# SELECT P.id, p.resource ->> 'birthDate' BD FROM recorditems P WHERE P.resource #> '{"resourceType":"Patient", "gender":"male"}' AND To_date(P.resource ->> 'birthDate', 'YYYY-MM-DD') > '1975-01-01'::date limit 10;
id | bd
--------+------------
372082 | 2011-07-27
372645 | 1988-11-02
373528 | 1984-07-11
376213 | 1982-01-03
377386 | 1995-01-20
377531 | 2002-02-11
377717 | 1991-11-15
378372 | 2018-09-27
378483 | 2009-01-11
378743 | 1996-02-27
(10 rows)
Time: 18.163 ms
phi=# SELECT P.id, p.resource ->> 'birthDate' BD FROM recorditems P WHERE P.resource #> '{"resourceType":"Patient", "gender":"male"}' AND To_date(P.resource ->> 'birthDate', 'YYYY-MM-DD') > '1975-01-01'::date limit 10;
id | bd
--------+------------
378743 | 1996-02-27
382517 | 1992-01-14
387866 | 1985-07-03
388180 | 1976-11-01
388627 | 1996-07-10
396668 | 1979-03-29
396754 | 2013-05-16
397054 | 1998-01-05
401771 | 1983-11-28
401891 | 2019-03-01
(10 rows)
Time: 44.394 ms

You are not sorting your results. A LIMIT without an ORDER BY will return (seemingly) random results as the rows of a table are not sorted.
If you want to get consistent results using LIMIT you have to use an ORDER BY as well.

When you have multiple sequential scans going against the same table at more or less the same time (loosely defined), PostgreSQL tries to synchronize them so that they can all benefit from the same warmed-up cache. If you turn off this feature with set synchronize_seqscans TO off, you will probably get more predictable results. But your original sin here is expecting an order, when you didn't request one.

Related

Given a row representing a path, union a total column

Say I have a table like the following table that represents a path from 1 -> 2 -> 3 -> 4 -> 5:
+------+----+--------+
| from | to | weight |
+------+----+--------+
| a | b | 1 |
| b | c | 2 |
| c | d | 1 |
| d | e | 1 |
| e | f | 3 |
+------+----+--------+
Each row knows where it came from and where it is going
I would like to union a total row that takes the starting name, ending name, and a total weight like so:
+------+----+--------+
| from | to | weight |
+------+----+--------+
| a | f | 8 |
+------+----+--------+
The first table is a result of a CTE expression, and I can easily get the total of the previous query with SUM, but I'm unable to get the LAST_VALUE to work in a similar way to:
WITH RECURSIVE cte AS (
...
)
SELECT *
FROM cte
UNION ALL
SELECT 'total', FIRST_VALUE(from), LAST_VALUE(to), SUM(weight)
FROM cte
The FIRST_VALUE and LAST_VALUE functions require OVER clauses which seem to add unnecessary complications to what I would expect, so I think I am going the wrong direction with that. Any ideas on how to achieve this?
So I made a strange solution that:
Selects the first from value (partitioned by TRUE)
Selects the last to value (partitioned by TRUE again)
Cross joins the sum of all weights, limited to 1
WITH RECURSIVE cte AS (
...
)
SELECT *
FROM cte
UNION ALL (
SELECT FIRST_VALUE(from) OVER (PARTITION BY TRUE), LAST_VALUE(to) OVER (PARTITION BY TRUE), total
FROM cte
CROSS JOIN (
SELECT SUM(weight) as total
FROM cte
) tmp
LIMIT 1
);
Is it hacky? Yes. Does it work? Also yes. I'm sure there are better solutions, and I would love to hear them.

SELECT DISTINCT on a ordered subquery's table

I'm working on a problem, involving these two tables.
books
isbn | title | author
------------+-----------------------------------------+------------------
1840918626 | Hogwarts: A History | Bathilda Bagshot
3458400871 | Fantastic Beasts and Where to Find Them | Newt Scamander
9136884926 | Advanced Potion-Making | Libatius Borage
transactions
id | patron_id | isbn | checked_out_date | checked_in_date
----+-----------+------------+------------------+-----------------
1 | 1 | 1840918626 | 2012-05-05 | 2012-05-06
2 | 4 | 9136884926 | 2012-05-05 | 2012-05-06
3 | 2 | 3458400871 | 2012-05-05 | 2012-05-06
4 | 3 | 3458400871 | 2018-04-29 | 2018-05-02
5 | 2 | 9136884926 | 2018-05-03 | NULL
6 | 1 | 3458400871 | 2018-05-03 | 2018-05-05
7 | 5 | 3458400871 | 2018-05-05 | NULL
the query "Make a list of all book titles and denote whether or not a copy of that book is checked out." so pretty much just the first table with a checked out column.
im trying to SELECT DISTINCT on a sub query with the checkout books first, but that doesn't work. I've researched and others say to accomplish this use a GROUP BY clause instead of DISTINCT but the examples they provide are one column queries and when more columns are added it doesn't work.
this is my closest attempt
SELECT DISTINCT ON (title)
title, checked_out
FROM(
SELECT b.title, t.checked_in_date IS NULL AS checked_out
FROM transactions t
natural join books b
ORDER BY checked_out DESC
) t;
or you can join only transactions where books are not checked in:
SELECT b.title, t.isbn IS NOT NULL AS checked_out
, t.checked_out_date
FROM books b
LEFT JOIN transactions t ON t.isbn = b.isbn AND t.checked_in_date IS NULL
ORDER BY checked_out DESC
I adjusted your attempt a little bit. Basically I changed the way your data is joined
SELECT DISTINCT ON (title)
title, checked_out
FROM(
SELECT b.title, t.checked_in_date IS NULL AS checked_out
FROM books b
LEFT OUTER JOIN transactions t USING (isbn)
ORDER BY checked_out DESC
) t;

Update intermediate result

EDIT
As requested a little background of what I want to achieve. I have a table that I want to query but I don't want to change the table itself. Next the result of the SELECT query (what I called the 'intermediate table') needs to be cleaned a bit. For example certain cells of certain rows need to be swapped and some strings need to be trimmed. Of course this could all be done as postprocessing in, e.g., Python, but I was hoping to do all of this with one query statement.
Being new to Postgresql I want to update the intermediate table that results from a SELECT statement. So I basically want to edit the resulting table from a SELECT statement in one query. I'd like to prevent having to store the intermediate result.
I've tried the following 'with clause':
with result as (
select
a
from
b
)
update result as r
set
a = 'd'
...but that results in ERROR: relation "result" does not exist, while the following does work:
with result as (
select
a
from
b
)
select
*
from
result
As I said, I'm new to Postgresql so it is entirely possible that I'm using the wrong approach.
Depending on the complexity of the transformations you want to perform, you might be able to munge it into the SELECT, which would let you get away with a single query:
WITH foo AS (SELECT lower(name), freq, cumfreq, rank, vec FROM names WHERE name LIKE 'G%')
SELECT ... FROM foo WHERE ...
Or, for more or less unlimited manipulation options, you could create a temp table that will disappear at the end of the current transaction. That doesn't get the job done in a single query, but it does get it all done on the SQL server, which might still be worthwhile.
db=# BEGIN;
BEGIN
db=# CREATE TEMP TABLE foo ON COMMIT DROP AS SELECT * FROM names WHERE name LIKE 'G%';
SELECT 4677
db=# SELECT * FROM foo LIMIT 5;
name | freq | cumfreq | rank | vec
----------+-------+---------+------+-----------------------
GREEN | 0.183 | 11.403 | 35 | 'KRN':1 'green':1
GONZALEZ | 0.166 | 11.915 | 38 | 'KNSL':1 'gonzalez':1
GRAY | 0.106 | 15.921 | 69 | 'KR':1 'gray':1
GONZALES | 0.087 | 18.318 | 94 | 'KNSL':1 'gonzales':1
GRIFFIN | 0.084 | 18.659 | 98 | 'KRFN':1 'griffin':1
(5 rows)
db=# UPDATE foo SET name = lower(name);
UPDATE 4677
db=# SELECT * FROM foo LIMIT 5;
name | freq | cumfreq | rank | vec
--------+-------+---------+-------+---------------------
grube | 0.002 | 67.691 | 7333 | 'KRP':1 'grube':1
gasper | 0.001 | 69.999 | 9027 | 'KSPR':1 'gasper':1
gori | 0.000 | 81.360 | 28946 | 'KR':1 'gori':1
goeltz | 0.000 | 85.471 | 47269 | 'KLTS':1 'goeltz':1
gani | 0.000 | 86.202 | 51743 | 'KN':1 'gani':1
(5 rows)
db=# COMMIT;
COMMIT
db=# SELECT * FROM foo;
ERROR: relation "foo" does not exist

Crafting a T-SQL Query

I have an admittedly novice question about a T-SQL query (which makes sense since I am indeed a novice when it comes to T-SQL).
Consider the following table --
Key | fieldName | Value
==============================
465 | Bing | 10
465 | Ping | 50
846 | Bing | 20
846 | Zing | 80
678 | Bing | 10
678 | Ping | 50
678 | Zing | 20
How would I compose a query to return the following?
If there exists a row with the fieldName Bing and Value of 10, return all of the rows with that key, otherwise don't return any rows pertaining to that key.
In the above example, the result set should be as follows --
Key | fieldName | Value
==============================
465 | Bing | 10
465 | Ping | 50
678 | Bing | 10
678 | Ping | 50
678 | Zing | 20
While I understand that there are likely ways better ways to reorganize the data stored in this table, I do not have control over this. I'm happy to read any comments regarding the reorganization of the data, but I can't mark anything an answer that doesn't solve the problem as it currently exists.
You can join on the table again to find the Bing/10 values:
SELECT DISTINCT T1.[Key], T1.fieldName, T1.Value
FROM YourTable T1
INNER JOIN YourTable T2 ON T1.[Key] = T2.[Key]
WHERE T2.fieldName = 'Bing' and T2.Value = 10
And because they're all the rage right now, here's a SQL Fiddle demonstration.
Another options would be:
SELECT DISTINCT T1.[Key], T1.fieldName, T1.Value
FROM YourTable T1
WHERE EXISTS (SELECT 1
FROM YourTable T2
WHERE T2.[Key] = T1.[Key]
AND T2.fieldName = 'Bing'
AND T2.Value = 10)

Equivalent to unpivot() in PostgreSQL

Is there a unpivot equivalent function in PostgreSQL?
Create an example table:
CREATE TEMP TABLE foo (id int, a text, b text, c text);
INSERT INTO foo VALUES (1, 'ant', 'cat', 'chimp'), (2, 'grape', 'mint', 'basil');
You can 'unpivot' or 'uncrosstab' using UNION ALL:
SELECT id,
'a' AS colname,
a AS thing
FROM foo
UNION ALL
SELECT id,
'b' AS colname,
b AS thing
FROM foo
UNION ALL
SELECT id,
'c' AS colname,
c AS thing
FROM foo
ORDER BY id;
This runs 3 different subqueries on foo, one for each column we want to unpivot, and returns, in one table, every record from each of the subqueries.
But that will scan the table N times, where N is the number of columns you want to unpivot. This is inefficient, and a big problem when, for example, you're working with a very large table that takes a long time to scan.
Instead, use:
SELECT id,
unnest(array['a', 'b', 'c']) AS colname,
unnest(array[a, b, c]) AS thing
FROM foo
ORDER BY id;
This is easier to write, and it will only scan the table once.
array[a, b, c] returns an array object, with the values of a, b, and c as it's elements.
unnest(array[a, b, c]) breaks the results into one row for each of the array's elements.
You could use VALUES() and JOIN LATERAL to unpivot the columns.
Sample data:
CREATE TABLE test(id int, a INT, b INT, c INT);
INSERT INTO test(id,a,b,c) VALUES (1,11,12,13),(2,21,22,23),(3,31,32,33);
Query:
SELECT t.id, s.col_name, s.col_value
FROM test t
JOIN LATERAL(VALUES('a',t.a),('b',t.b),('c',t.c)) s(col_name, col_value) ON TRUE;
DBFiddle Demo
Using this approach it is possible to unpivot multiple groups of columns at once.
EDIT
Using Zack's suggestion:
SELECT t.id, col_name, col_value
FROM test t
CROSS JOIN LATERAL (VALUES('a', t.a),('b', t.b),('c',t.c)) s(col_name, col_value);
<=>
SELECT t.id, col_name, col_value
FROM test t
,LATERAL (VALUES('a', t.a),('b', t.b),('c',t.c)) s(col_name, col_value);
db<>fiddle demo
Great article by Thomas Kellerer found here
Unpivot with Postgres
Sometimes it’s necessary to normalize de-normalized tables - the opposite of a “crosstab” or “pivot” operation. Postgres does not support an UNPIVOT operator like Oracle or SQL Server, but simulating it, is very simple.
Take the following table that stores aggregated values per quarter:
create table customer_turnover
(
customer_id integer,
q1 integer,
q2 integer,
q3 integer,
q4 integer
);
And the following sample data:
customer_id | q1 | q2 | q3 | q4
------------+-----+-----+-----+----
1 | 100 | 210 | 203 | 304
2 | 150 | 118 | 422 | 257
3 | 220 | 311 | 271 | 269
But we want the quarters to be rows (as they should be in a normalized data model).
In Oracle or SQL Server this could be achieved with the UNPIVOT operator, but that is not available in Postgres. However Postgres’ ability to use the VALUES clause like a table makes this actually quite easy:
select c.customer_id, t.*
from customer_turnover c
cross join lateral (
values
(c.q1, 'Q1'),
(c.q2, 'Q2'),
(c.q3, 'Q3'),
(c.q4, 'Q4')
) as t(turnover, quarter)
order by customer_id, quarter;
will return the following result:
customer_id | turnover | quarter
------------+----------+--------
1 | 100 | Q1
1 | 210 | Q2
1 | 203 | Q3
1 | 304 | Q4
2 | 150 | Q1
2 | 118 | Q2
2 | 422 | Q3
2 | 257 | Q4
3 | 220 | Q1
3 | 311 | Q2
3 | 271 | Q3
3 | 269 | Q4
The equivalent query with the standard UNPIVOT operator would be:
select customer_id, turnover, quarter
from customer_turnover c
UNPIVOT (turnover for quarter in (q1 as 'Q1',
q2 as 'Q2',
q3 as 'Q3',
q4 as 'Q4'))
order by customer_id, quarter;
FYI for those of us looking for how to unpivot in RedShift.
The long form solution given by Stew appears to be the only way to accomplish this.
For those who cannot see it there, here is the text pasted below:
We do not have built-in functions that will do pivot or unpivot. However,
you can always write SQL to do that.
create table sales (regionid integer, q1 integer, q2 integer, q3 integer, q4 integer);
insert into sales values (1,10,12,14,16), (2,20,22,24,26);
select * from sales order by regionid;
regionid | q1 | q2 | q3 | q4
----------+----+----+----+----
1 | 10 | 12 | 14 | 16
2 | 20 | 22 | 24 | 26
(2 rows)
pivot query
create table sales_pivoted (regionid, quarter, sales)
as
select regionid, 'Q1', q1 from sales
UNION ALL
select regionid, 'Q2', q2 from sales
UNION ALL
select regionid, 'Q3', q3 from sales
UNION ALL
select regionid, 'Q4', q4 from sales
;
select * from sales_pivoted order by regionid, quarter;
regionid | quarter | sales
----------+---------+-------
1 | Q1 | 10
1 | Q2 | 12
1 | Q3 | 14
1 | Q4 | 16
2 | Q1 | 20
2 | Q2 | 22
2 | Q3 | 24
2 | Q4 | 26
(8 rows)
unpivot query
select regionid, sum(Q1) as Q1, sum(Q2) as Q2, sum(Q3) as Q3, sum(Q4) as Q4
from
(select regionid,
case quarter when 'Q1' then sales else 0 end as Q1,
case quarter when 'Q2' then sales else 0 end as Q2,
case quarter when 'Q3' then sales else 0 end as Q3,
case quarter when 'Q4' then sales else 0 end as Q4
from sales_pivoted)
group by regionid
order by regionid;
regionid | q1 | q2 | q3 | q4
----------+----+----+----+----
1 | 10 | 12 | 14 | 16
2 | 20 | 22 | 24 | 26
(2 rows)
Hope this helps, Neil
Pulling slightly modified content from the link in the comment from #a_horse_with_no_name into an answer because it works:
Installing Hstore
If you don't have hstore installed and are running PostgreSQL 9.1+, you can use the handy
CREATE EXTENSION hstore;
For lower versions, look for the hstore.sql file in share/contrib and run in your database.
Assuming that your source (e.g., wide data) table has one 'id' column, named id_field, and any number of 'value' columns, all of the same type, the following will create an unpivoted view of that table.
CREATE VIEW vw_unpivot AS
SELECT id_field, (h).key AS column_name, (h).value AS column_value
FROM (
SELECT id_field, each(hstore(foo) - 'id_field'::text) AS h
FROM zcta5 as foo
) AS unpiv ;
This works with any number of 'value' columns. All of the resulting values will be text, unless you cast, e.g., (h).value::numeric.
Just use JSON:
with data (id, name) as (
values (1, 'a'), (2, 'b')
)
select t.*
from data, lateral jsonb_each_text(to_jsonb(data)) with ordinality as t
order by data.id, t.ordinality;
This yields
|key |value|ordinality|
|----|-----|----------|
|id |1 |1 |
|name|a |2 |
|id |2 |1 |
|name|b |2 |
dbfiddle
I wrote a horrible unpivot function for PostgreSQL. It's rather slow but it at least returns results like you'd expect an unpivot operation to.
https://cgsrv1.arrc.csiro.au/blog/2010/05/14/unpivotuncrosstab-in-postgresql/
Hopefully you can find it useful..
Depending on what you want to do... something like this can be helpful.
with wide_table as (
select 1 a, 2 b, 3 c
union all
select 4 a, 5 b, 6 c
)
select unnest(array[a,b,c]) from wide_table
You can use FROM UNNEST() array handling to UnPivot a dataset, tandem with a correlated subquery (works w/ PG 9.4).
FROM UNNEST() is more powerful & flexible than the typical method of using FROM (VALUES .... ) to unpivot datasets. This is b/c FROM UNNEST() is variadic (with n-ary arity). By using a correlated subquery the need for the lateral ORDINAL clause is eliminated, & Postgres keeps the resulting parallel columnar sets in the proper ordinal sequence.
This is, BTW, FAST -- in practical use spawning 8 million rows in < 15 seconds on a 24-core system.
WITH _students AS ( /** CTE **/
SELECT * FROM
( SELECT 'jane'::TEXT ,'doe'::TEXT , 1::INT
UNION
SELECT 'john'::TEXT ,'doe'::TEXT , 2::INT
UNION
SELECT 'jerry'::TEXT ,'roe'::TEXT , 3::INT
UNION
SELECT 'jodi'::TEXT ,'roe'::TEXT , 4::INT
) s ( fn, ln, id )
) /** end WITH **/
SELECT s.id
, ax.fanm -- field labels, now expanded to two rows
, ax.anm -- field data, now expanded to two rows
, ax.someval -- manually incl. data
, ax.rankednum -- manually assigned ranks
,ax.genser -- auto-generate ranks
FROM _students s
,UNNEST /** MULTI-UNNEST() BLOCK **/
(
( SELECT ARRAY[ fn, ln ]::text[] AS anm -- expanded into two rows by outer UNNEST()
/** CORRELATED SUBQUERY **/
FROM _students s2 WHERE s2.id = s.id -- outer relation
)
,( /** ordinal relationship preserved in variadic UNNEST() **/
SELECT ARRAY[ 'first name', 'last name' ]::text[] -- exp. into 2 rows
AS fanm
)
,( SELECT ARRAY[ 'z','x','y'] -- only 3 rows gen'd, but ordinal rela. kept
AS someval
)
,( SELECT ARRAY[ 1,2,3,4,5 ] -- 5 rows gen'd, ordinal rela. kept.
AS rankednum
)
,( SELECT ARRAY( /** you may go wild ... **/
SELECT generate_series(1, 15, 3 )
AS genser
)
)
) ax ( anm, fanm, someval, rankednum , genser )
;
RESULT SET:
+--------+----------------+-----------+----------+---------+-------
| id | fanm | anm | someval |rankednum| [ etc. ]
+--------+----------------+-----------+----------+---------+-------
| 2 | first name | john | z | 1 | .
| 2 | last name | doe | y | 2 | .
| 2 | [null] | [null] | x | 3 | .
| 2 | [null] | [null] | [null] | 4 | .
| 2 | [null] | [null] | [null] | 5 | .
| 1 | first name | jane | z | 1 | .
| 1 | last name | doe | y | 2 | .
| 1 | | | x | 3 | .
| 1 | | | | 4 | .
| 1 | | | | 5 | .
| 4 | first name | jodi | z | 1 | .
| 4 | last name | roe | y | 2 | .
| 4 | | | x | 3 | .
| 4 | | | | 4 | .
| 4 | | | | 5 | .
| 3 | first name | jerry | z | 1 | .
| 3 | last name | roe | y | 2 | .
| 3 | | | x | 3 | .
| 3 | | | | 4 | .
| 3 | | | | 5 | .
+--------+----------------+-----------+----------+---------+ ----
Here's a way that combines the hstore and CROSS JOIN approaches from other answers.
It's a modified version of my answer to a similar question, which is itself based on the method at https://blog.sql-workbench.eu/post/dynamic-unpivot/ and another answer to that question.
-- Example wide data with a column for each year...
WITH example_wide_data("id", "2001", "2002", "2003", "2004") AS (
VALUES
(1, 4, 5, 6, 7),
(2, 8, 9, 10, 11)
)
-- that is tided to have "year" and "value" columns
SELECT
id,
r.key AS year,
r.value AS value
FROM
example_wide_data w
CROSS JOIN
each(hstore(w.*)) AS r(key, value)
WHERE
-- This chooses columns that look like years
-- In other cases you might need a different condition
r.key ~ '^[0-9]{4}$';
It has a few benefits over other solutions:
By using hstore and not jsonb, it hopefully minimises issues with type conversions (although hstore does convert everything to text)
The columns don't need to be hard coded or known in advance. Here, columns are chosen by a regex on the name, but you could use any SQL logic based on the name, or even the value.
It doesn't require PL/pgSQL - it's all SQL