I have upgraded from PostgreSQL 9.1.5 to 9.2.1:
"PostgreSQL 9.1.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit"
"PostgreSQL 9.2.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit"
It is on the same machine with default PostgreSQL configuration files (only port was changed).
For testing purpose I have simple table:
CREATE TEMP TABLE test_table_md_speed(id serial primary key, n integer);
Which I test using function:
CREATE OR REPLACE FUNCTION TEST_DB_SPEED(cnt integer) RETURNS text AS $$
DECLARE
time_start timestamp;
time_stop timestamp;
time_total interval;
BEGIN
time_start := cast(timeofday() AS TIMESTAMP);
FOR i IN 1..cnt LOOP
INSERT INTO test_table_md_speed(n) VALUES (i);
END LOOP;
time_stop := cast(timeofday() AS TIMESTAMP);
time_total := time_stop-time_start;
RETURN extract (milliseconds from time_total);
END;
$$ LANGUAGE plpgsql;
And I call:
SELECT test_db_speed(1000000);
I see strange results. For PostgreSQL 9.1.5 I get "8254.769", and for 9.2.1 I get: "9022.219". This means that new version is slower. I cannot find why.
Any ideas why those results differ?
You say both are on the same machine. Presumably the data files for the newer version were added later. Later files tend to be added closer to the center of the platter, where access speeds are slower.
There is a good section on this in Greg Smith's book on PostgreSQL performance, including ways to measure and graph the effect. With clever use of the dd utility you might be able to do some ad hoc tests of the relative speed at each location, at least for reads.
The 9.2 release generally scales up to a large number of cores better than earlier versions, although in some of the benchmarks there was a very slight reduction in the performance of a single query running alone. I didn't see any benchmarks showing an effect anywhere near this big, though; I would bet on it being the result of position on the drive -- with just goes to show how hard it can be to do good benchmarking.
UPDATE: A change made in 9.2.0 to improve performance for some queries made some other queries perform worse. Eventually it was determined that this change should be reverted, which happened in version 9.2.3; so it is worth checking performance after upgrading to that maintenance release. A proper fix, which has been confirmed to fix the problem the reverted patch fixed without causing a regression, will be included in 9.3.0.
Related
I am facing a tricky error with a PostgreSQL Database that suddenly popped up and I cannot reproduce elsewhere.
The error happened suddenly without any known maintenance or upgrade and seems to be related to a specific database context.
Documentation
The bug seems to go back and forth, here is a list of links found when searching over the web for the error message:
Februray 2015: How to fix "InternalError: variable not found in subplan target list"
October 2017: query error: variable not found in subplan target list (when using PG 9.6.2)
Feburary 2022: PGroonga index-only scan problem with yesterday’s PostgreSQL updates
June 2022 (my report): Sudden database error with COUNT(*) making Query Planner crashes: variable not found in subplan target list
The product version I have detected the error is:
SELECT version();
-- PostgreSQL 13.6 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
And the only extensions I have installed are:
SELECT extname, extversion FROM pg_extension;
-- "plpgsql" "1.0"
-- "postgis" "3.1.1"
Symptom
The error main symptom is variable not found in subplan target list:
SELECT COUNT(*) FROM items;
-- ERROR: variable not found in subplan target list
-- SQL state: XX000
And does not affect all tables, just some specific ones.
What is interesting is that it is only partially broken:
SELECT COUNT(id) FROM items; -- 213
SELECT COUNT(*) FROM items WHERE id > 0; -- 213
And it only affect the COUNT(*) aggregate most probably because of the * placeholder.
Further more the error is related to the query plan not to the query itself as:
EXPLAIN SELECT COUNT(*) FROM item;
-- ERROR: variable not found in subplan target list
-- SQL state: XX000
Fails as well without actually executing the query.
Digging into the PostgreSQL code on GitHub the error message appears here and is related to the function search_indexed_tlist_for_var in case it returns nothing.
This pointer should explain why it happens when using * placeholder instead of an explicit column name.
Reproducibility
It is a tricky bug, simply because showing it exists is difficult and the bug is somehow vicious as by now I cannot understand which conditions make it happen.
It seems this bug is raising in specific context (eg. bug with equivalent message and symptom reported with the PGroonga extension) but in my case I cannot make a parallel by now.
It is likely I am facing an equivalent problem in a different context but I could not succeed in capturing a simple MCVE to spot it.
CREATE TABLE t AS SELECT CAST(c AS text) FROM generate_series(1, 10000) AS c;
-- SELECT 10000
CREATE INDEX t_c ON t(c);
-- CREATE INDEX
VACUUM t;
-- VACUUM
SELECT COUNT(*) FROM t;
-- 10000
Works as expected. Table having the issue relies on postgis extension index, but again I cannot reproduce it:
CREATE EXTENSION postgis;
-- CREATE EXTENSION
CREATE TABLE test(
id serial,
geom geometry(Point, 4326)
);
-- CREATE TABLE
INSERT INTO test
SELECT x, ST_MakePoint(x/10000., x/10000.) FROM generate_series(1, 10000) AS x;
-- INSERT 0 10000
CREATE INDEX test_index ON test USING GIST(geom);
-- CREATE INDEX
VACUUM test;
-- VACUUM
SELECT COUNT(*) FROM test;
-- 10000
Works as expected.
And when I dump and restore the faulty database the problem vanishes.
Looking for a MCVE
When trying to reproduce the bug in order to build an MCVE and unit tests to highlight it and report it to developers I face a limitation. When dumping the database and recreating to a new instance, the bug simply vanishes.
So the unique way I can reproduce this bug is using the original database but I could not succeed to prepare a dump of the database to reproduce the bug elsewhere.
This what it is all about, I am looking for hints to reproduce the bug in my context.
At this point my analysis is:
The bug is related to the database state or to some meta data that is not equally the same when the dump is restored;
The bug is related to the COUNT function when using the * wildcard when there is no filtering clause;
The bug is not general as it affects only specific tables with specific index;
The bug reside at the query planner side.
Seems like some meta or state corruption prevent the query planner to find a column name to apply the COUNT method.
Question
My question is: How can I deeper investigate this bug to make it:
either reproducible (a dump technique preserving it);
or understandable to a developer (meta queries to identify where the problem resides in the database)?
Another way to phrase it would be:
How can I reproduce the context which is making the query planner crashes?
Is there a way to make the planner more verbose in order to get more details on the error?
What queries can I run against the catalog to capture the faulty context?
I have simple function, if I run it it takes around 40 second to finish.
select * from f_cyklus1(100000000)
but if I run this function 8 times in 8 separated instances, meaning all 8 function are running in parallel it takes around 210 to 260 seconds to finish for each of it's instances. Which is a massive drop in performance. I tried to compile it as 8 individual functions and run it again but it had no change in performance.
select * from f_cyklus1(100000000);
select * from f_cyklus2(100000000);
select * from f_cyklus3(100000000);
select * from f_cyklus4(100000000);
select * from f_cyklus5(100000000);
select * from f_cyklus6(100000000);
select * from f_cyklus7(100000000);
select * from f_cyklus8(100000000);
So why it takes 40s compare to 210-260s to finish? Our virtual machine has 16 CPUs and physical hardware was at low usage. I was also the only one using the Postgre database at the time of testing.
create or replace function f_cyklus1 (p_rozsah int) returns bigint as -- drop function f_cyklus(int)
$body$
declare
declare
v_exc_context TEXT;
v_result INTEGER;
p_soucet bigint :=0;
begin
for i in 0..p_rozsah
loop
p_soucet = p_soucet + i;
end loop;
return p_soucet;
EXCEPTION
WHEN OTHERS THEN
GET STACKED DIAGNOSTICS v_exc_context = PG_EXCEPTION_CONTEXT;
PERFORM main.ut_log('ERR', SQLERRM || ' [SQL State: ' || SQLSTATE || '] Context: ' || v_exc_context );
RAISE;
END;
$body$ LANGUAGE plpgsql
PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-39), 64-bit
Virtual machine: Centos 7 + KVM
HW: 2x AMD EPYC 7351 + 256 GB RAM
Note: I already asked similar question where I thought it was due to asynchronous processing, but this shows the problem is actually in raw Postgres performance, therefore I deleted my former question and asked this new one.
p_soucet = p_soucet + i;
Each time you do this, it has to acquire a "snapshot" in which to run the statement, as it uses the regular SQL engine behind the scenes and that always needs to run in snapshot. Acquiring a snapshot requires a system-wide lock. The more processes you have running simultaneously, the more time they spend fighting to acquire the snapshot, rather than doing useful work.
If you run the function in a transaction which is set to "repeatable read", you will find they scale better because they keep the same snapshot for the duration and keep re-using it. Of course that might interfere with your real use case.
plpgsql is not really well suited for this kind of work, scaling aside. You can use one of the other pl languages, like plperl or plpythonu.
How expressions are evaluated by main SQL engine is described at https://www.postgresql.org/docs/current/plpgsql-expressions.html
Snapshots are discussed in general at the docs starting at https://www.postgresql.org/docs/current/mvcc.html
I am not aware that the interaction between the two are documented anywhere for end users.
When are DB2 declared global temporary tables 'cleaned up' and automatically deleted by the system...? This is for DB2 on AS400 v7r3m0, with DBeaver 5.2.5 as the dev client, and MS-Access 2007 for packaged apps for the end-users.
Today I started experimenting with a DGTT, thanks to this answer. So far I'm pleased with the functionality, although I did find our more recent system version has the WITH DATA option, which is an obvious advantage.
Everything is working, but at times I receive this error:
SQL Error [42710]: [SQL0601] NEW_PKG_SHEETS_DATA in QTEMP type *FILE already exists.
The meaning of the error is obvious, but the timing is not. When I started today, I could run the query multiple times, and the error didn't occur. It seemed as if the system was cleaning up and deleting it, which is just what I was looking for. But then the error started and now it's happening with more frequency.
If I make strategic use of DROP TABLE, this resolves the error, unless the table doesn't exist, in which case I get another error. I can also disconnect/reconnect to the server from my SQL dev client, as I would expect, since that would definitely drop the session.
This IBM article about DGTTs speaks much of sessions, but not many specifics. And this article is possibly the longest command syntax I've yet encountered in the IBM documentation. I got through it, but it didn't answer the question of what decided when a DGTT is deleted.
So I would like to ask:
What are the boundaries of a session..?
I'm thinking this is probably defined by the environment in my SQL client..?
I guess the best/safest thing to do is use DROP TABLE as needed..?
Does any one have any tips, tricks, or pointers they could share..?
Below is the SQL that I'm developing. For brevity, I've excluded chunks of the WITH-AS and SELECT statements:
DROP TABLE SESSION.NEW_PKG_SHEETS ;
DECLARE GLOBAL TEMPORARY TABLE SESSION.NEW_PKG_SHEETS_DATA
AS ( WITH FIRSTDAY AS (SELECT (YEAR(CURDATE() - 4 MONTHS) * 10000) +
(MONTH(CURDATE() - 4 MONTHS) * 100) AS DATEISO
FROM SYSIBM.SYSDUMMY1
-- <VARIETY OF ADDITIONAL CTE CLAUSES>
-- <SELECT STATEMENT BELOW IS A BIT LONGER>
SELECT DAACCT AS DAACCT,
DAIDAT AS DAIDAT,
DAINV# AS DAINV,
CAST(DAITEM AS NUMERIC(6)) AS DAPACK,
CAST(0 AS NUMERIC(14)) AS UPCNUM,
DAQTY AS DAQTY
FROM DAILYTRANS
AND DAIDAT >= (SELECT DATEISO+000 FROM FIRSTDAY) -- 1ST DAY FOUR MONTHS AGO
AND DAIDAT <= (SELECT DATEISO+399 FROM FIRSTDAY) -- LAST DAY OF LAST MONTH
) WITH DATA ;
DROP TABLE SESSION.NEW_PKG_SHEETS ;
The DGTT will only get cleaned automatically up by Db2 when the connection ends successfully (connect reset or equivalent according to whatever interface to Db2 is being used ).
For both Db2 for i and Db2-LUW, consider using the WITH REPLACE clause for the DECLARE GLOBAL TEMPORARY TABLE statement. That will ensure you don't need to explicitly drop the DGTT if the session remains open but the code needs the table to be replaced at next execution whether or not the DGTT already exists.
Using that WITH REPLACE clause means you do not need to worry about issuing a DROP statement for the DGTT, unless you really want to issue a drop.
Sometimes sessions may get re-used, or a close/disconnect might not happen or might not complete, or more likely a workstation performs a retry, and in those cases the WITH REPLACE can be essential for easily avoiding runtime errors.
Note that Db2 for Z/OS (at v12) does not offer the WITH REPLACE clause for DGTT, but has instead an optional syntax on commit drop table (but this is not documented for Db2-for-i and Db2-LUW).
Background
A service running same tasks against a number of similar PostgreSQL instances. Most environments are on version 10, but some are on 9. Upgrading them is not an option in short term at least.
Problem
To improve performance, we used PostgreSQL 10 feature CREATE STATISTICS. It works just fine on on environments on v10, but is not supported on v9.
One way to deal with it could be to duplicate each script that uses CREATE STATISTICS, maintain a copy of it without those statement and choose which script to run at application level. I'd like to avoid it as it's a lot of duplicated code to maintain.
I've tried to cheat it by only creating the statistics if the script finds appropriate version (code below), but on v9 it still gets picked up as a syntax error.
DO $$
-- server_version_num is a version number melted to an integer:
-- 9.6.6 -> 09.06.06 -> 90606
-- 10.0.1 -> 10.00.01 -> 100001
DECLARE pg_version int := current_setting('server_version_num');
BEGIN
IF pg_version >= 100000 THEN
CREATE STATISTICS table_1_related_col_group_a (NDISTINCT)
ON col_a1, col_a2
FROM schema_1.table_1;
CREATE STATISTICS table_2_related_col_group_b (NDISTINCT)
ON col_b1, col_b2, col_b3
FROM schema_1.table_2;
END IF;
END $$ LANGUAGE plpgsql;
Question
Is there a way to run a script that has an unsupported statement like CREATE STATISTICS in it without tipping postgres 9 off?
Use dynamic SQL. It won't be evaluated unless executed.
DO $$
-- server_version_num is a version number melted to an integer:
-- 9.6.6 -> 09.06.06 -> 90606
-- 10.0.1 -> 10.00.01 -> 100001
DECLARE pg_version int := current_setting('server_version_num');
BEGIN
IF pg_version >= 100000 THEN
EXECUTE 'CREATE STATISTICS table_1_related_col_group_a (NDISTINCT)
ON col_a1, col_a2
FROM schema_1.table_1;
CREATE STATISTICS table_2_related_col_group_b (NDISTINCT)
ON col_b1, col_b2, col_b3
FROM schema_1.table_2;';
END IF;
END $$ LANGUAGE plpgsql;
Could you tell me why this query works in pgAdmin, but doesn't with software using ODBC:
CREATE TEMP TABLE temp296 WITH (OIDS) ON COMMIT DROP AS
SELECT age_group AS a,male AS m,mode AS t,AVG(speed) AS speed
FROM person JOIN info ON person.ppid=info.ppid
WHERE info.mode=2
GROUP BY age_group,male,mode;
SELECT age_group,male,mode,
CASE
WHEN age_group=1 AND male=0 THEN (info_dist_km/(SELECT avg_speed FROM temp296 WHERE a=1 AND m=0))*60
ELSE 0
END AS info_durn_min
FROM person JOIN info ON person.ppid=info.ppid
WHERE info.mode IN (7) AND info.info_dist_km>2;
I got "42P01: ERROR: relation "temp296" does not exist".
I also have tried with "BEGIN; [...] COMMIT;" - "HY010:The cursor is open".
PostgreSQL 9.0.10, compiled by Visual C++ build 1500, 64-bit
psqlODBC 09.01.0200
Windows 7 x64
I think that the reason why it did not work for you because by default ODBC works in autocommit mode. If you executed your statements serially, the very first statement
CREATE TEMP TABLE temp296 ON COMMIT DROP ... ;
must have autocommitted after finishing, and thus dropped your temp table.
Unfortunately, ODBC does not support directly using statements like BEGIN TRANSACTION; ... COMMIT; to handle transactions.
Instead, you can disable auto-commit using SQLSetConnectAttr function like this:
SQLSetConnectAttr(hdbc, SQL_ATTR_AUTOCOMMIT, SQL_AUTOCOMMIT_OFF, 0);
But, after you do that, you must remember to commit any change by using SQLEndTran like this:
SQLEndTran(SQL_HANDLE_DBC, hdbc, SQL_COMMIT);
While WITH approach has worked for you as a workaround, it is worth noting that using transactions appropriately is faster than running in auto-commit mode.
For example, if you need to insert many rows into the table (thousands or millions), using transactions can be hundreds and thousand times faster than autocommit.
It is not uncommon for temporary tables to not be available via SQLPrepare/SQLExecute in ODBC i.e., on prepared statements e.g., MS SQL Server is like this. The solution is usually to use SQLExecDirect.